Bootstrapping a (New?) LHC Data Transfer Ecosystem

Similar documents
Introduction to SciTokens

Benchmarking third-party-transfer protocols with the FTS

Outline. ASP 2012 Grid School

Using a dynamic data federation for running Belle-II simulation applications in a distributed cloud environment

The SciTokens Authorization Model: JSON Web Tokens & OAuth

File Access Optimization with the Lustre Filesystem at Florida CMS T2

Introduction to SRM. Riccardo Zappi 1

The Echo Project An Update. Alastair Dewhurst, Alison Packer, George Vasilakakos, Bruno Canning, Ian Johnson, Tom Byrne

dcache integration into HDF

Patrick Fuhrmann (DESY)

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

ARC integration for CMS

New data access with HTTP/WebDAV in the ATLAS experiment

Dynamic Federations. Seamless aggregation of standard-protocol-based storage endpoints

Tests of PROOF-on-Demand with ATLAS Prodsys2 and first experience with HTTP federation

UNIT IV PROGRAMMING MODEL. Open source grid middleware packages - Globus Toolkit (GT4) Architecture, Configuration - Usage of Globus

WLCG Transfers Dashboard: a Unified Monitoring Tool for Heterogeneous Data Transfers.

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

SNAG: SDN-managed Network Architecture for GridFTP Transfers

Efficient HTTP based I/O on very large datasets for high performance computing with the Libdavix library

Distributed Systems. 25. Authentication Paul Krzyzanowski. Rutgers University. Fall 2018

One Pool To Rule Them All The CMS HTCondor/glideinWMS Global Pool. D. Mason for CMS Software & Computing

Data Storage. Paul Millar dcache

Leveraging the Globus Platform in your Web Applications. GlobusWorld April 26, 2018 Greg Nawrocki

Data Management 1. Grid data management. Different sources of data. Sensors Analytic equipment Measurement tools and devices

UK Tier-2 site evolution for ATLAS. Alastair Dewhurst

ATLAS & Google "Data Ocean" R&D Project

Distributing storage of LHC data - in the nordic countries

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

Distributed Data Management on the Grid. Mario Lassnig

CS November 2018

Globus GTK and Grid Services

API Security Management SENTINET

dcache Introduction Course

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

Challenges in Storage

Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008

Data Transfers Between LHC Grid Sites Dorian Kcira

Monitoring for IT Services and WLCG. Alberto AIMAR CERN-IT for the MONIT Team

GACP through GURUGURU

glite Grid Services Overview

GAME-SPY through GURUGURU

The HTCondor CacheD. Derek Weitzel, Brian Bockelman University of Nebraska Lincoln

Building the Modern Research Data Portal using the Globus Platform. Rachana Ananthakrishnan GlobusWorld 2017

PoS(EGICF12-EMITC2)106

3.2 COMMUNICATION AND INTERNET TECHNOLOGIES

ECE 650 Systems Programming & Engineering. Spring 2018

AIM Enterprise Platform Software IBM z/transaction Processing Facility Enterprise Edition 1.1.0

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

Xrootd Monitoring for the CMS Experiment

Monitoring of large-scale federated data storage: XRootD and beyond.

Grid Architectural Models

INDIGO AAI An overview and status update!

Understanding StoRM: from introduction to internals

Credentials Management for Authentication in a Grid-Based E-Learning Platform

FTS3 a file transfer service for Grids, HPCs and Clouds

Lessons Learned in the NorduGrid Federation

Using HTTP/2 as a Transport for Arbitrary Bytestreams

Guidelines on non-browser access

Globus Toolkit Firewall Requirements. Abstract

Using OAuth 2.0 to Access ionbiz APIs

The INFN Tier1. 1. INFN-CNAF, Italy

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

Building the Modern Research Data Portal. Developer Tutorial

Grid Data Management

The GAT Adapter to use GT4 RFT

EMI Deployment Planning. C. Aiftimiei D. Dongiovanni INFN

Technical Overview. Version March 2018 Author: Vittorio Bertola

High Performance Computing Course Notes Grid Computing I

Day 1 : August (Thursday) An overview of Globus Toolkit 2.4

Using WebGallery, WebDeploy and some IIS Extensions

Usage of "OAuth2" policy action in CentraSite and Mediator

API Security Management with Sentinet SENTINET

Libelium Cloud Hive. Technical Guide

EMI Data, the unified European Data Management Middleware

COMPUTE CANADA GLOBUS PORTAL

Distributed Systems Exam 1 Review Paul Krzyzanowski. Rutgers University. Fall 2016

Integration Services 2014

globus online Globus Nexus Steve Tuecke Computation Institute University of Chicago and Argonne National Laboratory

Data Movement and Storage. 04/07/09 1

The HTTP protocol. Fulvio Corno, Dario Bonino. 08/10/09 http 1

Introduction to RESTful Web Services. Presented by Steve Ives

Tutorial: Building the Services Ecosystem

Transport Layer (TCP/UDP)

Oracle 10g and IPv6 IPv6 Summit 11 December 2003

Analytics Platform for ATLAS Computing Services

RESTful API Design APIs your consumers will love

DATAOPS.BARCELONA SIMPLIFYING IDENTITY MANAGEMENT WITH SSO TOOLS

Grid Computing Fall 2005 Lecture 5: Grid Architecture and Globus. Gabrielle Allen

Enabling a SuperFacility with Software Defined Networking

and the GridKa mass storage system Jos van Wezel / GridKa

Computing activities in Napoli. Dr. Silvio Pardi (INFN-Napoli) Belle II Italian collaboration meeting 21 November 2017 Pisa - Italy

[GSoC Proposal] Securing Airavata API

Interfacing HTCondor-CE with OpenStack: technical questions

Lab: 2. Wireshark Getting Started

Introduction to GT3. Introduction to GT3. What is a Grid? A Story of Evolution. The Globus Project

Identity Management in ESA Grid on-demand Infrastructure

Data Access and Data Management

CS 3640: Introduction to Networks and Their Applications

Transcription:

Bootstrapping a (New?) LHC Data Transfer Ecosystem Brian Paul Bockelman, Andy Hanushevsky, Oliver Keeble, Mario Lassnig, Paul Millar, Derek Weitzel, Wei Yang

Why am I here? The announcement in mid-2017 that Globus Toolkit support would end set off a flurry of activity. Some of it was toward shorter-term collaborations around community support of this software. See https://gridcf.org This reinvigorated existing work around replacing various Globus Toolkit components; the most pressing are: Grid Security Infrastructure (GSI): An authentication and authorization infrastructure based around concepts of identity and X509 proxies. GridFTP: A FTP-like transfer protocol that build on top of GSI, supports third-partytransfers, and multi-tcp-stream transfers. Luckily, there s a huge amount of prior effort to draw on, some dating back several years. Hence it s not really new. Thanks to the work of many people, what s shown here is just some clever re-arrangements!!2

WLCG Transfer Ecosystem Demonstrator There s a need to organize the entire vertical stack to have a cohesive solution approach. Data Management Layer Rucio PhEDEx We benefit little if multiple storage elements take mutually-incompatible approaches. Same applies for moving across the data management / file transfer / storage layers. Put together a Google group to coordinate this activity and start to scale: File Transfer Layer FTS Storage Layer dcache XRootD Feel free to join! https://groups.google.com/forum/#!forum/ wlcg-http-transfer!3 EOS StoRM DPM

Transfers Under GridFTP - Where we are today! File Transfer Service Request: Send file 1 to port 1234 on Storage B. Response: OK, in progress! Request: Start receiving file 1. Response: OK, listening on port 1234 Storage A Send bytestream over TCP Storage B FTS must be authorized to talk to both endpoints. Endpoints support the same protocol (GridFTP). Queueing (in implementation) is in FTS layer.!4

Alternate TPC Model - Where we might go! Request: Send file 1 to URL on Storage B. - Use given authentication with Storage B. Response: OK, in progress! File Transfer Service Storage A GET / PUT Storage B FTS only communicates with the active storage (A). FTS provides URL for B and authz token. Transfer from A->B may occur on any mutual protocol. FTS relies on storage A for heavy lifting.!5

HTTPS / WebDAV WebDAV is a set of HTTP extensions that provide a more standardized, file-like API with minimal HTTP changes. Example: MKCOL (make collection) is mostly equivalent to a POSIX mkdir(). Another WebDAV extension is COPY, which instructs the WebDAV server to copy to/from a given URL. Precisely what is needed for the alternate TPC model! The URL is given in the Source header; not necessarily HTTPS! COPY /store/path HTTP/1.1 Host: storage.site1.com Source: https://storage.site2.com/store/path.src!6

HTTPS / WebDAV - Authorization Step It s clear FTS can use its favorite existing mechanism when communicating with the active SE (Storage A). How does it transfer a credential to the active SE for use with Storage B? In X509-land, we have the concept of delegating a credential for this movement. Unfortunately, the delegation procedure is only standardized at the transport layer (TCP). The WLCG community has a somewhat ad-hoc* standard for this based on SOAP, as defined by GridSite. We advocate a token-based method for authorizing the transfer instead. * https://egee-jra1-data.web.cern.ch/egee-jra1-data/gridsitedelegation/head/doc/glite-security-delegation-interface/delegationinterface.html!7

Reminder: Here s our picture FTS (fts_url_copy) Request 1: COPY /store/path HTTP/1.1 Host: storage.site1.com Source: https://storage.site2.com/store/path.src Authorization: Bearer abcdef Copy-Header: Authorization: Bearer 12345 storage.site1.com Request 2: GET /store/path.src HTTP/1.1 Host: storage.site2.com Authorization: Bearer 12345 storage.site2.com Here, I illustrate the case where the actual copy also goes over HTTP!8

HTTP Request HTTP verb Resource at active SE (destination) Example request from FTS to active SE: COPY /store/path HTTP/1.1 Host: storage.site1.com Source: https://storage.site2.com/store/ path.src Source URL Authorization: Bearer abcdef Copy-Header: Authorization: Bearer 12345 Indication to copy header to GET request Token for inactive SE Token for active SE

Real Request Here s an example request from yesterday: COPY /user/uscms01/pnfs/..truncated../loadtestdownload/loadtest07_ucsd_b0_ni3ssxhd0it5rf6w_286 HTTP/1.1 User-Agent: fts_url_copy/3.7.7 gfal2/2.15.0 neon/0.0.29 TE: trailers Host: red-gridftp12.unl.edu:1094 Source: https://gftp-1.t2.ucsd.edu:1094/cms/..truncated../loadtest07_ucsd_b0 X-Number-Of-Streams: 3 Secure-Redirection: 1 Authorization: Bearer eyjhbgcioi..truncated..nyu5gx6yrzhkpdct2sedvocihzsuqknunzcrhxj6tbjxoza ClientInfo: job-id=dc417124-30d7-11e8-bd67-5254000b9cba;file-id=1080;retry=0 TransferHeaderAuthorization: Bearer eyjhbgci..truncated..5_-z7xqw RequireChecksumVerification: false

Get Your Tokens! In the latest FTS release, at the start of a transfer, FTS will: Generate a SciToken* if a token issuer is specified (see next presentation!). If that fails, use the X509 proxy to generate a macaroon (see Paul s presentation on macaroons). If that fails, fall back to gridsite-based delegation. The multiple fallbacks are designed to provide a smooth transition off X509! *currently authenticates with token server using X509 proxy, but hopefully not in future

Working up the Stack We have an initial prototype functioning as XRootD plugins. Stable enough to put at production servers at three different sites. Hopefully this can benefit EOS and DPM as well! dcache has test code for SciTokens and macaroons are in production. GFAL2, DAVIX, and FTS have patches in release (or testing) supporting the end-to-end. PhEDEx changes available as patch and Rucio changes are in a testing branch. Working the vertical: patches across about a dozen software packages.!12

Timeline to Success? Given the bleeding edge version of software (and appropriate configurations), we have shown HTTPS can technically replace GridFTP use for all storage used in the WLCG. Work for the remainder of 2018: Demonstrate the full compatibility matrix of SEs in production. Design and perform scale tests. More importantly, what s next? Need to work with a few experiments (LHC or otherwise) to determine schedule and future intent. Should we set a bold goal like All WLCG sites supporting CMS must support non-gridftp third party copy in 2019? Otherwise, there s danger this effort will fizzle out!