FIRST EXPERIENCE WITH SEAFILE SYNC & SHARE SOLUTION Maciej Brzeźniak, Stanisław Jankowski, Sławomir Zdanowski, Krzysztof Wadówka box.psnc.pl PSNC, Poznań, Poland
AGENDA NRENs in sync & share business: why NREN should provide it? dominating technology and alternatives Why Seafile?: Seafile data model, architecture and features Possible challenges and decision factors Seafile experience User experience Operational aspects Future plans
NRENS IN THE SYNC & SHARE BUSINESS
WHAT IS CLOUD? =?
DROPBOX FROM THE NREN ADDED VALUES trusted service from the NREN SLA proximity: e.g. PL vs IE/US (latency, bandwidth) space limits (5-10GB in public clouds not enough) cost per TB integration with other NREN services (incl. AAI)
APPROACH (MOST POPULAR) take owncloud as the basis add (or not) own developments use free or enteprise version (400k licenses in Geant-ownCloud agreement) motivations: large community, see e.g. C3S initiative participation to development possible sharing storage back-end for various applications features: access to back-end file system possible data model not specialised for sync & share
AN OTHER APPROACH SEAFILE-BASED BOX.PSNC.PL
WHY SEAFILE? Specialised solution designed to be good for sync & share (not necessarily in other things) reliable - data model, synchronisation algorithm effective - low-level implementation (C) low-overhead - minimum data in the DB (only shares, etc.), metadata in storage backend Features: can speak to filesystem, NFS, S3, Swift/Ceph at the backend imposes its own data organisation Mature, targeted to enterprise market, used and academia
WHY SEAFILE FOR SYNC&SHARE? SYNCHRONISATION BASED ON SNAPSHOTS NOT PER-FILE VERSIONING
WHY SEAFILE FOR SYNC&SHARE? ONLY DELTAS INCLUDED IN COMMITS, CONTENT DEFINED CHUNKING ALGORITHM USED FOR DEDUP
WHY SEAFILE FOR SYNC&SHARE? SEAFILE SERVER ARCHITECTURE Main modules: Seafile Deamon - the Data Service Seahub - Web Application Ccnet server - RPC service deamon
WHY SEAFILE FOR SYNC&SHARE? POSSIBLE CHALLENGES (TECHNICAL) custom data format -> can t share storage backend for sync & share and filesystem access easily this may exclude Seafile for large (existing) datasets (see CERN, aarnet) BUT specialised architecture enables scaling the service to large size: users, files, servers
WHY SEAFILE FOR SYNC&SHARE? smaller community: POSSIBLE CHALLENGES (NON-TECHNICAL) BUT growing; in Europe, led by Humboldt Universitaet in Berlin active user forum: https://forum.edu.seafile.de development model source codes on GITHUB, see contibutions: https://github.com/haiwen/ experience from requesting features very positive: e.g. SAML/Shibboleth etc. Seafile desktop client Seahub - Seafie Web interface Seafile server core
WHY SEAFILE FOR SYNC&SHARE? OUR DECISION FACTORS USERS want the Dropbox equivalent - not something else! WE@PSNC want to fully exploit: user s proximity to our DC, network bandwidth PERFORMANCE is what users appreciate especially if they deal with lots of and large files (see next slides) RELIABILITY is what they expect! failures not to be forgiven
WHY NOT OTHER SOLUTIONS? MULTI-PURPOSE NATURE IS NOT ALWAYS ADVANTAGE Source: http://www.fastcarinvasion.com/must-see-moment-tractor-crosses-way-racing-car/
PERFORMANCE & RESOURCE USAGE COMPARISON
DISCLAIMER
DISCLAIMER ON PERFORMANCE TESTING PROCEDURE We tested most popular / interesting solutions We put effort into making the tests as objective as possible Solutions were tested in same conditions (server, backend, OS) Test procedure identical to the extent possible We used community versions (not enterprise) No special performance tunning performed No feedback from developers processed (yet)
SEAFILE VS OTHERS PERFORMANCE TEST: 1. SMALL FILES Testing set: Linux kernel source v. 4.5.3 706 MB of data 52 881 files 3 544 directories meta-data intensive Test scenario: measured operations bold client 1: kernel source unpacked client 1: data set upload client 2 data set download client 1: random data removal (388 dirs, 4328 files) client I: removal propagation client 2: change detection Our testing environment described in backup slides
SEAFILE VS OTHERS SMALL FILES PERFORMANCE TEST (TIME) TIME Seafile [seconds] theother [seconds] theother [minutes] Client 1: upload Client 2: download: Client 1: folder removal Client 2: change detection 90 1800 35 60 1320 22 1-2 60-180 1-3 1-2 4 0,0(6)
SEAFILE VS OTHERS SMALL FILES PERFORMANCE TEST (TIME) SPEED Seafile [files-dirs/s] theother [files-dirs/s] difference Client 1: upload Client 2: download: Client 1: folder removal Client 2: change detection 627 27 23x 940 43 22x 25 000-50 000 26-78 181-640x 2358-4716 1179 2-4x
SEAFILE VS OTHERS PERFORMANCE COMPARISON TEST LARGE FILES Testing set: 5x 1GB file 5GB of data not meta-data intensive Test scenario: measured operations bold client 1: upload client 2: download Our testing environment described in backup slides
SEAFILE VS OTHERS LARGE FILES PERFORMANCE TEST (TIME) TIME Seafile [s] theother [s] 5x1GB file upload 36 56 5x1GB file download 21 8.5 SPEED Seafile [GB/s] theother [GB/s] 5x1GB file upload 0.17 0.11 5x1GB file download 0.29 0.71
SERVER RESOURCES USAGE: SEAFILE VS OTHERS - SMALL FILES TEST Seafile server: ~4.5% during 10 minutes theother server: ~10% during 0.5 hour
WHY THIS MATTERS? PERFORMANCE & RESOURCES USAGE At present we are targeting academic sector: researchers and staff then students In future we might even target primary/secondary schools with cloud services Excess resources usage might become very costly
USER EXPERIENCE
USER EXPERIENCE GENERAL USERS wanted Dropbox' equivalent - not something else! USER obtained: REALLY FAST! Dropbox-equivalent No load on clients Many libraries / local dir pairs can be configured Thunderbird Filelink plug-in vastly used & appreciated! Usage: Web, filelink, desktop clients, mobiles Use cases: syncing accross devices sharing (e.g. books to calibri folder) content sharing and provisioning - simplicity! (PSNC PR dept.) even large data backups (~0,5TB)
USER EXPERIENCE POWER USERS STORIES USER1 - Adam Mickiewicz University in Poznan, Faculty of Physics - 230GB @BOX Syncing computations results: 30GB/file Plus: documents, publications, presentations, and analyses of computing results User comment: I would apprieciate no limits! (100 GB would not solve the problem) USER2 - Adam Mickiewicz University in Poznan, Faculty of Physics - 550GB @BOX Syncing simulation results data among 2 computers Backup copies (!) of the data User comment: BOX enables to easily & safely store data, sync and transfer the data among 2 machines USER3 - PR dept. @PSNC - 128GB @BOX Syncing photos and graphic files: 20k-4GB/file, 38k files Planned - serving content (images) for WordPress-based websites directly from Seafile servers User comment: no limits! keep backups and versions!
OPERATIONAL EXPERIENCE
OPERATIONAL EXPERIENCE WORKS! AS CHARM! very low load on server run on 4GB, 1VCPU VM for >1 year No interventions related to server issues: increasing apache/javascript limits (yep, some users wanted to download zipped 30GB directory through Web) adding storage space :) :) :) No excess problem solving knowledge needed as far it s not yet big scale
FUTURE PLANS
FUTURE PLANS IN-LAB WORK FURTHER TESTS of SCALABILITY: clustered setup with load-balancing & HA Web interface tests (desktop clients as far) Seafile with various bakends: CEPH, IBM GPFS, EMC ScaleIO+clusterFS SECURITY AUDIT: 1st phase - very promising results 2nd phase - until June/July 2016
FUTURE PLANS BATTLEFIELD WORK ROLLOUT: to more universities (larger pilot) BOX.PSNC.PL -> BOX.PIONIER.NET.PL Integration with federated AAI: successfully tested with EduGain under Windows, Linux, MacOS and Android AAI on to be tested on more platforms
CONCLUSIONS
CONCLUSIONS We decided to use Seafile as performant, reliable and specilised sync & share solution We re testing scalability in laboratory and in real life (pilot since 2014) We re collecting the numbers, observations and user/ops experience and are open to share it We re working on various ways of integrating the service with the user workflow - also to be shared
MESSAGE TO NRENS
MESSAGE As a community we need diversity in technologies and aproaches We address various use cases so generality vs specialiation discussion will be happeining While using one(cloud) ;) technology we need to have a look at the alternative Sharing is caring. so we should stay open on experiencing new and sharing both good and bad experience There are forums to share real-life experience: Open Stack Operators forum [OSO] TF-Storage -> TF-Storage&IaaS
FIRST EXPERIENCE WITH SEAFILE SYNC & SHARE SOLUTION THANK YOU! box.psnc.pl Maciej Brzezniak, Stanislaw Jankowski, Sławomir Zdanowski, Krzysztof Wadówka PSNC, Poznan, Poland
MORE PERFORMANCE FIGURES IN THE BACKUP SLIDES
BACKUP SLIDES (PERFORMANCE MONITORING)
SMALL FILES TEST (LINUX KERNEL SOURCE)
SEAFILE RESOURCES USAGE VERSION 2 Client 1 Client 2 Server
THEOTHER RESOURCES USAGE VERSION 2 Client 1 Client 2 Server
CLIENT-CLIENT COMPARISON: SEAFILE VS THEOTHER Seafile client 1 TheOther client 1
CLIENT-CLIENT COMPARISON: SEAFILE VS THEOTHER Seafile Client 2 owncloud Client 2
SERVER-SERVER COMPARISON: SEAFILE VS THEOTHER Seafile server theother server
TESTBED CONFIGURATION
TESTBED CONFIGURATION 2 servers: 1 for Seafile, 1 for theothers server model: Huawei RH2288 V3 CPU: CPU: 2 x Intel Xeon E5-2620 v3 2.4GHz RAM: 128GB Interfaces: Ethernet: 2 x10gbit Active/Passive bonding FC: 2x FC 16Gbit, multipathing Storage 12x 10TB LUN with RAID10 striped host-side wigh LVM Software: Seafile version: 5.1.7, free version owncloud version: 9.0.2, community OS: Ubuntu 14.04.4 LTS