FIRST EXPERIENCE WITH SEAFILE SYNC & SHARE SOLUTION

Similar documents
Maciej Brzeźniak, Stanisław Jankowski, Sławomir Zdanowski HPC Department, PSNC, Poznan

Testing storage and metadata backends

ViewBox. Integrating Local File Systems with Cloud Storage Services. Yupu Zhang +, Chris Dragga + *, Andrea Arpaci-Dusseau +, Remzi Arpaci-Dusseau +

Identity management for the TUB Cloud. T. Hildmann, O. Kao, C. Ritter tubit, CIT EUNIS 2013

Qsync. Cross-device File Sync for Optimal Teamwork. Share your life and work

Feedback on BeeGFS. A Parallel File System for High Performance Computing

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide

What s next in Nextcloud. Frank Karlitschek

On-Premises Cloud Platform. Bringing the public cloud, on-premises

Synchronisation solutions,

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

New HPE 3PAR StoreServ 8000 and series Optimized for Flash

Update for TF-Storage. TF-Storage September 22nd, 2014

Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW

sciebo - die Campus Cloud

Triton file systems - an introduction. slide 1 of 28

Evaluating Cloud Storage Strategies. James Bottomley; CTO, Server Virtualization

NetVault Backup Client and Server Sizing Guide 2.1

Re-Architecting Cloud Storage with Intel 3D XPoint Technology and Intel 3D NAND SSDs

NetVault Backup Client and Server Sizing Guide 3.0

Build Cloud like Rackspace with OpenStack Ansible

Rethink Storage: The Next Generation Of Scale- Out NAS

EMC Backup and Recovery for Microsoft SQL Server

Analytics in the cloud

BEST PRACTICES FOR OPTIMIZING YOUR LINUX VPS AND CLOUD SERVER INFRASTRUCTURE

IBM SYSTEM POWER7. PowerVM. Jan Kristian Nielsen Erik Rex IBM Corporation

LATEST INTEL TECHNOLOGIES POWER NEW PERFORMANCE LEVELS ON VMWARE VSAN

SUSE Linux Enterprise High Availability Extension

Linux File Systems: Challenges and Futures Ric Wheeler Red Hat

CloudOpen Europe 2013 SYNNEFO: A COMPLETE CLOUD STACK OVER TECHNICAL LEAD, SYNNEFO

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

The advantages of architecting an open iscsi SAN

Hyper-converged storage for Oracle RAC based on NVMe SSDs and standard x86 servers

The BioHPC Nucleus Cluster & Future Developments

Two hours - online. The exam will be taken on line. This paper version is made available as a backup

HPC in Cloud. Presenter: Naresh K. Sehgal Contributors: Billy Cox, John M. Acken, Sohum Sohoni

IBM Storwize V7000 Unified

Using EUDAT services to replicate, store, share, and find cultural heritage data

Chapter 3 Virtualization Model for Cloud Computing Environment

We're Not CIS. The Technical Staff & Computing Facilities. User Community. What CIS Does for CS. CIS: Computing and Information Services.

Ekran System System Requirements and Performance Numbers

IBM Emulex 16Gb Fibre Channel HBA Evaluation

We're Not CIS. The Technical Staff & Computing Facilities. What CIS Does for CS. User Community

Scalability Testing of DNE2 in Lustre 2.7 and Metadata Performance using Virtual Machines Tom Crowe, Nathan Lavender, Stephen Simms

Network Storage Appliance

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

StorPool Distributed Storage Software Technical Overview

The Oracle Database Appliance I/O and Performance Architecture

RHEV in the weeds - special sauce! Marc Skinner

Symantec NetBackup PureDisk Compatibility Matrix Created August 26, 2010

An ESS implementation in a Tier 1 HPC Centre

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission

Perform scalable data exchange using InfoSphere DataStage DB2 Connector

Agenda. Qsync usage scenarios and sample applications. QNAP NAS specifications recommended by various types of users.

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

Open Source Storage. Ric Wheeler Architect & Senior Manager April 30, 2012

HPC File Systems and Storage. Irena Johnson University of Notre Dame Center for Research Computing

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1

SCS Distributed File System Service Proposal

A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU

IBM B2B INTEGRATOR BENCHMARKING IN THE SOFTLAYER ENVIRONMENT

Red Hat Enterprise 7 Beta File Systems

Parallel File Systems for HPC

Provisioning with SUSE Enterprise Storage. Nyers Gábor Trainer &

Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage

MODERN FILESYSTEM PERFORMANCE IN LOCAL MULTI-DISK STORAGE SPACE CONFIGURATION

Effective Use of CSAIL Storage

SONAS Best Practices and options for CIFS Scalability

InfiniBand Networked Flash Storage

WebRTC video-conferencing facilities for research, educational and art societies

A Case for High Performance Computing with Virtual Machines

Emerging Technologies for HPC Storage

Quobyte The Data Center File System QUOBYTE INC.

Backup Appliances. Geir Aasarmoen og Kåre Juvkam

Data Management: the What, When and How

The Lion of storage systems

Introduction. Kevin Miles. Paul Henderson. Rick Stillings. Essex Scales. Director of Research Support. Systems Engineer.

Storage Considerations for VMware vcloud Director. VMware vcloud Director Version 1.0

Windows Servers In Microsoft Azure

Choosing Hardware and Operating Systems for MySQL. Apr 15, 2009 O'Reilly MySQL Conference and Expo Santa Clara,CA by Peter Zaitsev, Percona Inc

Owncloud scalability and a Nextcloud design for users.

PAC094 Performance Tips for New Features in Workstation 5. Anne Holler Irfan Ahmad Aravind Pavuluri

NFSv4.1 Using pnfs PRESENTATION TITLE GOES HERE. Presented by: Alex McDonald CTO Office, NetApp

Introducing SUSE Enterprise Storage 5

How Symantec Backup solution helps you to recover from disasters?

V.I.B.E. Virtual. Integrated. Blade. Environment. Harveenpal Singh. System-x PLM

This presentation is for informational purposes only and may not be incorporated into a contract or agreement.

Looking ahead with IBM i. 10+ year roadmap

Simplified Multi-Tenancy for Data Driven Personalized Health Research

COS 318: Operating Systems. File Systems. Topics. Evolved Data Center Storage Hierarchy. Traditional Data Center Storage Hierarchy

A New Key-Value Data Store For Heterogeneous Storage Architecture

Introduction. Architecture Overview

IT Infrastructure: Poised for Change

Best Practices for designing Farms and Clusters

Storage Solutions for VMware: InfiniBox. White Paper

Virtualizing SQL Server 2008 Using EMC VNX Series and VMware vsphere 4.1. Reference Architecture

Structuring PLFS for Extensibility

General Purpose Storage Servers

The Fastest And Most Efficient Block Storage Software (SDS)

Ambry: LinkedIn s Scalable Geo- Distributed Object Store

Transcription:

FIRST EXPERIENCE WITH SEAFILE SYNC & SHARE SOLUTION Maciej Brzeźniak, Stanisław Jankowski, Sławomir Zdanowski, Krzysztof Wadówka box.psnc.pl PSNC, Poznań, Poland

AGENDA NRENs in sync & share business: why NREN should provide it? dominating technology and alternatives Why Seafile?: Seafile data model, architecture and features Possible challenges and decision factors Seafile experience User experience Operational aspects Future plans

NRENS IN THE SYNC & SHARE BUSINESS

WHAT IS CLOUD? =?

DROPBOX FROM THE NREN ADDED VALUES trusted service from the NREN SLA proximity: e.g. PL vs IE/US (latency, bandwidth) space limits (5-10GB in public clouds not enough) cost per TB integration with other NREN services (incl. AAI)

APPROACH (MOST POPULAR) take owncloud as the basis add (or not) own developments use free or enteprise version (400k licenses in Geant-ownCloud agreement) motivations: large community, see e.g. C3S initiative participation to development possible sharing storage back-end for various applications features: access to back-end file system possible data model not specialised for sync & share

AN OTHER APPROACH SEAFILE-BASED BOX.PSNC.PL

WHY SEAFILE? Specialised solution designed to be good for sync & share (not necessarily in other things) reliable - data model, synchronisation algorithm effective - low-level implementation (C) low-overhead - minimum data in the DB (only shares, etc.), metadata in storage backend Features: can speak to filesystem, NFS, S3, Swift/Ceph at the backend imposes its own data organisation Mature, targeted to enterprise market, used and academia

WHY SEAFILE FOR SYNC&SHARE? SYNCHRONISATION BASED ON SNAPSHOTS NOT PER-FILE VERSIONING

WHY SEAFILE FOR SYNC&SHARE? ONLY DELTAS INCLUDED IN COMMITS, CONTENT DEFINED CHUNKING ALGORITHM USED FOR DEDUP

WHY SEAFILE FOR SYNC&SHARE? SEAFILE SERVER ARCHITECTURE Main modules: Seafile Deamon - the Data Service Seahub - Web Application Ccnet server - RPC service deamon

WHY SEAFILE FOR SYNC&SHARE? POSSIBLE CHALLENGES (TECHNICAL) custom data format -> can t share storage backend for sync & share and filesystem access easily this may exclude Seafile for large (existing) datasets (see CERN, aarnet) BUT specialised architecture enables scaling the service to large size: users, files, servers

WHY SEAFILE FOR SYNC&SHARE? smaller community: POSSIBLE CHALLENGES (NON-TECHNICAL) BUT growing; in Europe, led by Humboldt Universitaet in Berlin active user forum: https://forum.edu.seafile.de development model source codes on GITHUB, see contibutions: https://github.com/haiwen/ experience from requesting features very positive: e.g. SAML/Shibboleth etc. Seafile desktop client Seahub - Seafie Web interface Seafile server core

WHY SEAFILE FOR SYNC&SHARE? OUR DECISION FACTORS USERS want the Dropbox equivalent - not something else! WE@PSNC want to fully exploit: user s proximity to our DC, network bandwidth PERFORMANCE is what users appreciate especially if they deal with lots of and large files (see next slides) RELIABILITY is what they expect! failures not to be forgiven

WHY NOT OTHER SOLUTIONS? MULTI-PURPOSE NATURE IS NOT ALWAYS ADVANTAGE Source: http://www.fastcarinvasion.com/must-see-moment-tractor-crosses-way-racing-car/

PERFORMANCE & RESOURCE USAGE COMPARISON

DISCLAIMER

DISCLAIMER ON PERFORMANCE TESTING PROCEDURE We tested most popular / interesting solutions We put effort into making the tests as objective as possible Solutions were tested in same conditions (server, backend, OS) Test procedure identical to the extent possible We used community versions (not enterprise) No special performance tunning performed No feedback from developers processed (yet)

SEAFILE VS OTHERS PERFORMANCE TEST: 1. SMALL FILES Testing set: Linux kernel source v. 4.5.3 706 MB of data 52 881 files 3 544 directories meta-data intensive Test scenario: measured operations bold client 1: kernel source unpacked client 1: data set upload client 2 data set download client 1: random data removal (388 dirs, 4328 files) client I: removal propagation client 2: change detection Our testing environment described in backup slides

SEAFILE VS OTHERS SMALL FILES PERFORMANCE TEST (TIME) TIME Seafile [seconds] theother [seconds] theother [minutes] Client 1: upload Client 2: download: Client 1: folder removal Client 2: change detection 90 1800 35 60 1320 22 1-2 60-180 1-3 1-2 4 0,0(6)

SEAFILE VS OTHERS SMALL FILES PERFORMANCE TEST (TIME) SPEED Seafile [files-dirs/s] theother [files-dirs/s] difference Client 1: upload Client 2: download: Client 1: folder removal Client 2: change detection 627 27 23x 940 43 22x 25 000-50 000 26-78 181-640x 2358-4716 1179 2-4x

SEAFILE VS OTHERS PERFORMANCE COMPARISON TEST LARGE FILES Testing set: 5x 1GB file 5GB of data not meta-data intensive Test scenario: measured operations bold client 1: upload client 2: download Our testing environment described in backup slides

SEAFILE VS OTHERS LARGE FILES PERFORMANCE TEST (TIME) TIME Seafile [s] theother [s] 5x1GB file upload 36 56 5x1GB file download 21 8.5 SPEED Seafile [GB/s] theother [GB/s] 5x1GB file upload 0.17 0.11 5x1GB file download 0.29 0.71

SERVER RESOURCES USAGE: SEAFILE VS OTHERS - SMALL FILES TEST Seafile server: ~4.5% during 10 minutes theother server: ~10% during 0.5 hour

WHY THIS MATTERS? PERFORMANCE & RESOURCES USAGE At present we are targeting academic sector: researchers and staff then students In future we might even target primary/secondary schools with cloud services Excess resources usage might become very costly

USER EXPERIENCE

USER EXPERIENCE GENERAL USERS wanted Dropbox' equivalent - not something else! USER obtained: REALLY FAST! Dropbox-equivalent No load on clients Many libraries / local dir pairs can be configured Thunderbird Filelink plug-in vastly used & appreciated! Usage: Web, filelink, desktop clients, mobiles Use cases: syncing accross devices sharing (e.g. books to calibri folder) content sharing and provisioning - simplicity! (PSNC PR dept.) even large data backups (~0,5TB)

USER EXPERIENCE POWER USERS STORIES USER1 - Adam Mickiewicz University in Poznan, Faculty of Physics - 230GB @BOX Syncing computations results: 30GB/file Plus: documents, publications, presentations, and analyses of computing results User comment: I would apprieciate no limits! (100 GB would not solve the problem) USER2 - Adam Mickiewicz University in Poznan, Faculty of Physics - 550GB @BOX Syncing simulation results data among 2 computers Backup copies (!) of the data User comment: BOX enables to easily & safely store data, sync and transfer the data among 2 machines USER3 - PR dept. @PSNC - 128GB @BOX Syncing photos and graphic files: 20k-4GB/file, 38k files Planned - serving content (images) for WordPress-based websites directly from Seafile servers User comment: no limits! keep backups and versions!

OPERATIONAL EXPERIENCE

OPERATIONAL EXPERIENCE WORKS! AS CHARM! very low load on server run on 4GB, 1VCPU VM for >1 year No interventions related to server issues: increasing apache/javascript limits (yep, some users wanted to download zipped 30GB directory through Web) adding storage space :) :) :) No excess problem solving knowledge needed as far it s not yet big scale

FUTURE PLANS

FUTURE PLANS IN-LAB WORK FURTHER TESTS of SCALABILITY: clustered setup with load-balancing & HA Web interface tests (desktop clients as far) Seafile with various bakends: CEPH, IBM GPFS, EMC ScaleIO+clusterFS SECURITY AUDIT: 1st phase - very promising results 2nd phase - until June/July 2016

FUTURE PLANS BATTLEFIELD WORK ROLLOUT: to more universities (larger pilot) BOX.PSNC.PL -> BOX.PIONIER.NET.PL Integration with federated AAI: successfully tested with EduGain under Windows, Linux, MacOS and Android AAI on to be tested on more platforms

CONCLUSIONS

CONCLUSIONS We decided to use Seafile as performant, reliable and specilised sync & share solution We re testing scalability in laboratory and in real life (pilot since 2014) We re collecting the numbers, observations and user/ops experience and are open to share it We re working on various ways of integrating the service with the user workflow - also to be shared

MESSAGE TO NRENS

MESSAGE As a community we need diversity in technologies and aproaches We address various use cases so generality vs specialiation discussion will be happeining While using one(cloud) ;) technology we need to have a look at the alternative Sharing is caring. so we should stay open on experiencing new and sharing both good and bad experience There are forums to share real-life experience: Open Stack Operators forum [OSO] TF-Storage -> TF-Storage&IaaS

FIRST EXPERIENCE WITH SEAFILE SYNC & SHARE SOLUTION THANK YOU! box.psnc.pl Maciej Brzezniak, Stanislaw Jankowski, Sławomir Zdanowski, Krzysztof Wadówka PSNC, Poznan, Poland

MORE PERFORMANCE FIGURES IN THE BACKUP SLIDES

BACKUP SLIDES (PERFORMANCE MONITORING)

SMALL FILES TEST (LINUX KERNEL SOURCE)

SEAFILE RESOURCES USAGE VERSION 2 Client 1 Client 2 Server

THEOTHER RESOURCES USAGE VERSION 2 Client 1 Client 2 Server

CLIENT-CLIENT COMPARISON: SEAFILE VS THEOTHER Seafile client 1 TheOther client 1

CLIENT-CLIENT COMPARISON: SEAFILE VS THEOTHER Seafile Client 2 owncloud Client 2

SERVER-SERVER COMPARISON: SEAFILE VS THEOTHER Seafile server theother server

TESTBED CONFIGURATION

TESTBED CONFIGURATION 2 servers: 1 for Seafile, 1 for theothers server model: Huawei RH2288 V3 CPU: CPU: 2 x Intel Xeon E5-2620 v3 2.4GHz RAM: 128GB Interfaces: Ethernet: 2 x10gbit Active/Passive bonding FC: 2x FC 16Gbit, multipathing Storage 12x 10TB LUN with RAID10 striped host-side wigh LVM Software: Seafile version: 5.1.7, free version owncloud version: 9.0.2, community OS: Ubuntu 14.04.4 LTS