Efficient HTTP based I/O on very large datasets for high performance computing with the Libdavix library

Similar documents
New data access with HTTP/WebDAV in the ATLAS experiment

Tests of PROOF-on-Demand with ATLAS Prodsys2 and first experience with HTTP federation

Dynamic Federations. Seamless aggregation of standard-protocol-based storage endpoints

Parallel Storage Systems for Large-Scale Machines

Distributed Filesystem

Evaluation of the Huawei UDS cloud storage system for CERN specific data

The CORAL Project. Dirk Düllmann for the CORAL team Open Grid Forum, Database Workshop Barcelona, 4 June 2008

SE Memory Consumption

XCache plans and studies at LMU Munich

SE Memory Consumption

Using S3 cloud storage with ROOT and CvmFS

Data transfer over the wide area network with a large round trip time

Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW

Introduction to Grid Computing

The Lion of storage systems

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

Improve Web Application Performance with Zend Platform

A scalable storage element and its usage in HEP

Using a dynamic data federation for running Belle-II simulation applications in a distributed cloud environment

Benchmarking third-party-transfer protocols with the FTS

User Manual. Admin Report Kit for IIS 7 (ARKIIS)

Recovering Disk Storage Metrics from low level Trace events

Accelerate Big Data Insights

Improving Packet Processing Performance of a Memory- Bounded Application

Influence of Distributing a Tier-2 Data Storage on Physics Analysis

Evolution of Cloud Computing in ATLAS

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

WHEN the Large Hadron Collider (LHC) begins operation

Deduplication Storage System

Datacenter replication solution with quasardb

Scientific data management

Outline. ASP 2012 Grid School

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

ATLAS operations in the GridKa T1/T2 Cloud

LUSTRE NETWORKING High-Performance Features and Flexible Support for a Wide Array of Networks White Paper November Abstract

MERCED CLUSTER BASICS Multi-Environment Research Computer for Exploration and Discovery A Centerpiece for Computational Science at UC Merced

File Access Optimization with the Lustre Filesystem at Florida CMS T2

CA485 Ray Walshe Google File System

Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

WLCG Transfers Dashboard: a Unified Monitoring Tool for Heterogeneous Data Transfers.

Chapter 7. GridStor Technology. Adding Data Paths. Data Paths for Global Deduplication. Data Path Properties

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

PROOF-Condor integration for ATLAS

Best Practices for Setting BIOS Parameters for Performance

Got Isilon? Need IOPS? Get Avere.

Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong

Supra-linear Packet Processing Performance with Intel Multi-core Processors

FuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897

Virtual WAN Optimization Controllers

Cisco Wide Area Application Services (WAAS) Mobile

Enabling High Performance Bulk Data Transfers With SSH

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Adapted from: TRENDS AND ATTRIBUTES OF HORIZONTAL AND VERTICAL COMPUTING ARCHITECTURES

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Data Storage. Paul Millar dcache

High Availability/ Clustering with Zend Platform

Parallel Programming Multicore systems

The Google File System

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

TrafficDB: HERE s High Performance Shared-Memory Data Store Ricardo Fernandes, Piotr Zaczkowski, Bernd Göttler, Conor Ettinoffe, and Anis Moussa

GFS: The Google File System. Dr. Yingwu Zhu

Grid Architectural Models

Presented By: Ian Kelley

Primavera Compression Server 5.0 Service Pack 1 Concept and Performance Results

Hardware & System Requirements

File Transfer: Basics and Best Practices. Joon Kim. Ph.D. PICSciE. Research Computing 09/07/2018

HIGH ENERGY PHYSICS ON THE OSG. Brian Bockelman CCL Workshop, 2016

Sandor Heman, Niels Nes, Peter Boncz. Dynamic Bandwidth Sharing. Cooperative Scans: Marcin Zukowski. CWI, Amsterdam VLDB 2007.

PARALLEL PROCESSING OF LARGE DATA SETS IN PARTICLE PHYSICS

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

Evolution of an Apache Spark Architecture for Processing Game Data

RIGHTNOW A C E

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

TLDK Overview. Transport Layer Development Kit Keith Wiles April Contributions from Ray Kinsella & Konstantin Ananyev

PROOF Benchmark on Different

Data Transfers Between LHC Grid Sites Dorian Kcira

MixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp

Monitoring of large-scale federated data storage: XRootD and beyond.

Isilon Performance. Name

Acceleration Systems Technical Overview. September 2014, v1.4

Dell EMC CIFS-ECS Tool

An Empirical Study of High Availability in Stream Processing Systems

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568

represent parallel computers, so distributed systems such as Does not consider storage or I/O issues

powered by Cloudian and Veritas

CSE 124: Networked Services Lecture-16

UK Tier-2 site evolution for ATLAS. Alastair Dewhurst

Machine Learning for (fast) simulation

Virtual WAN Optimization Controllers

Low Latency Data Grids in Finance

Storage Adapter Testing Report

di-eos "distributed EOS": Initial experience with split-site persistency in a production service

Feedback on BeeGFS. A Parallel File System for High Performance Computing

Orleans. Actors for High-Scale Services. Sergey Bykov extreme Computing Group, Microsoft Research

The INFN Tier1. 1. INFN-CNAF, Italy

Transcription:

Efficient HTTP based I/O on very large datasets for high performance computing with the Libdavix library Authors Devresse Adrien (CERN) Fabrizio Furano (CERN)

Typical HPC architecture Computing Cluster Remote I/O requirements in HPC Low latency High speed network Large data storage systems High throughput Parallel access Reliability

HPC I/O protocols A protocol Zoo... irods GridFTP dcap AFS

Very specific protocols They are often specific to a storage system They use advanced caching strategies and optimizations They are optimized for the previous HPC requirements High throughput Parallel access Low latency

Standards : Should we re-define one more? Classic problem

Crazy Idea Why not using HTTP? Is this madness?

First, it seems crazy Text based, Stateless No multi-plexing Suffers of the TCP slow start mechanism No standard «fail-over» mechanism No multi-sources / multi-streams Incomplete support for partial I/O

Not so crazy Widespread has a rich ecosystem and powerful actors A protocol that scales HTTP Caching is easy to deploy Today most Storage Systems provide an HTTP gate Flexible, Extensible HTTP is

What we did Created a tool-kit for HPC I/O with HTTP protocol named DAVIX Apply several optimizations to make HTTP a competitive protocol in term of performance with HPC specific protocols. Benchmark it with a High Energy physics data analysis work-flow.

Problem : Parallelism and persistent connection Operation Multiplexing HPC I/O protocol YES HTTP Pipelining Persistent connection YES KeepAlive

Multi-plexing vs pipelining Request Pipelining are in order Request Pipelining introduces latency

Optimization: recycling and request dispatch pattern Maximize the usage of each TCP connection with KeepAlive Dispatch parallel queries to different execution queues using a session pool pattern

Optimization: recycling and request dispatch pattern Requests Dispatch Connections Recycler KeepAlive KeepAlive

Optimization: Bulk query system with branch prediction High Energy physics data are compressed Significant number of little data chunk We vectorize sequential I/O operations into Bulk operations Based on HTTP Multi-part content type Vector size > 10000 chunks We use a cache with informed prefetching TTreeCache reduce the number of network queries.

Optimization: Bulk queries system with Informed Prefetching

Details of the benchmarks (1) Execute a HEP analysis job based on the ROOT data analysis framework Each job reads 12000 events in a 700 MBytes in a compressed file remotely We use for remote I/O XrootD toolkit with the XrootD protocol DAVIX with the HTTP protocol

Details of the benchmarks (2) Each job is executed with the HammerCloud grid testing framework Results obtained on 576 run over 12 days Tests executed against Disk Pool Manager 1.8.8 storage system 4 Core Intel Xeon CPU 32 GB of RAM 1 Gigabit network link

Performance after optimizations Average Execution time of the Job CERN CERN : Analysis over LAN access Time (s) CERN UK: Analysis over European PAN Network

Problem: Reliability and replicas We are in a world wide distributed environment Data object replicas are spread in different datacenters and stored with different Storage system technologies HTTP is a 1-1 client server protocol No recovery in case of server failure

Metalink and HTTP Metalink is a standard file format supporting replicas and meta-data descriptions We use metalink for transparent recovery in case of server unavailability <?xml version="1.0" encoding="utf-8"?> <metalink xmlns="urn:ietf:params:xml:ns:metalink"> <published>2009-05-15t12:23:23z</published> <file name="example.ext"> <size>14471447</size> <identity>example</identity> <version>1.0</version> <file name="example2.ext"> <size>14471447</size> <identity>example2</identity> <description> Another description for a second file. </description> <hash type="sha-256">2f548ce50c459a0270e85a7d63b2383c5523...</hash> <url location="de" priority="1">ftp://ftp.example.com/example2.ext</url> <url location="fr" priority="1">http://example.com/example2.ext</url> <metaurl mediatype="torrent" priority="2">http://example.com/example2.ext.torrent</metaurl> </file> </metalink>

Optimization: Metalink and HTTP for transparent recovery Dynamic Storage Federator DFS Worker Node

Optimization: Metalink and HTTP for transparent recovery Transparently recovers from a server failure as long as one replica is available world wide Multi-stream from different sources based on HTTP

Conclusion HTTP can compete with HPC specific protocols for data analysis use cases. HTTP weakness in HPC can be compensate with informed prefetching, session recycling and Large bulk operation support. Reliability of I/O over HTTP in Distributed environment can be greatly improved with Metalink support.

About DAVIX Offers a I/O and a file management API Shared Library C++ & set of tools Already released Open Source Integrated with the ROOT Analysis framework Used by the File Transfer Service of of the Worldwide LHC Computing Grid

Informations About Davix http://dmc.web.cern.ch/projects/davix/home About our HTTP dynamic federation https://svnweb.cern.ch/trac/lcgdm/wiki/dynafeds About the ROOT analysis framework http://root.cern.ch/drupal/

Questions?