Access Coordination of Tertiary Storage for High Energy Physics Applications

Similar documents
Access Coordination Of Tertiary Storage For High Energy Physics Applications

High-Energy Physics Data-Storage Challenges

Data oriented job submission scheme for the PHENIX user analysis in CCJ

Data oriented job submission scheme for the PHENIX user analysis in CCJ

and the GridKa mass storage system Jos van Wezel / GridKa

LCG data management at IN2P3 CC FTS SRM dcache HPSS

Optimizing Parallel Access to the BaBar Database System Using CORBA Servers

PROOF-Condor integration for ATLAS

The CMS Computing Model

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Storage Hierarchy Management for Scientific Computing

Outline. Operating Systems: Devices and I/O p. 1/18

Fast pattern recognition with the ATLAS L1Track trigger for the HL-LHC

August 31, 2009 Bologna Workshop Rehearsal I

Clustering and Reclustering HEP Data in Object Databases

A Simple Mass Storage System for the SRB Data Grid

A data Grid testbed environment in Gigabit WAN with HPSS

The ATLAS EventIndex: Full chain deployment and first operation

Professor: Pete Keleher! Closures, candidate keys, canonical covers etc! Armstrong axioms!

Tertiary Storage Organization for Large Multidimensional Datasets

Distributed File Systems Part IV. Hierarchical Mass Storage Systems

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

Event reconstruction in STAR

ONLINE MONITORING SYSTEM FOR THE EXPERIMENT

IBM IBM Open Systems Storage Solutions Version 4. Download Full Version :

A Graphical User Interface for Job Submission and Control at RHIC/STAR using PERL/CGI

Storage on the Lunatic Fringe. Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM

Lawrence Berkeley National Laboratory Recent Work

Storage Resource Sharing with CASTOR.

FastForward I/O and Storage: ACG 8.6 Demonstration

Performance Monitor. Version: 7.3

Storage and File Structure

Table 9. ASCI Data Storage Requirements

CMS Computing Model with Focus on German Tier1 Activities

Storage Supporting DOE Science

Storage and I/O requirements of the LHC experiments

Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory

Update of the SoLID Cerenkov detector for PVDIS: CSI coated GEM option August, 03, Eric Fuchey Temple University

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Chapter 10: Mass-Storage Systems

Optimizing Tertiary Storage Organization and Access for Spatio-Temporal Datasets

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition

1. Introduction. Outline

Grid data access on widely distributed worker nodes using scalla and SRM

Tracker algorithm based on Hough transformation

Da-Wei Chang CSIE.NCKU. Professor Hao-Ren Ke, National Chiao Tung University Professor Hsung-Pin Chang, National Chung Hsing University

Virtualizing a Batch. University Grid Center

Distributed Video Systems Chapter 3 Storage Technologies

IEPSAS-Kosice: experiences in running LCG site

Outlook. File-System Interface Allocation-Methods Free Space Management

Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories

Hall D and IT. at Internal Review of IT in the 12 GeV Era. Mark M. Ito. May 20, Hall D. Hall D and IT. M. Ito. Introduction.

A scalable storage element and its usage in HEP

Tracking and compression techniques

CHAPTER 12: MASS-STORAGE SYSTEMS (A) By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

Mass-Storage Structure

LHCb Computing Resources: 2018 requests and preview of 2019 requests

Outlines. Chapter 2 Storage Structure. Structure of a DBMS (with some simplification) Structure of a DBMS (with some simplification)

HITACHI VIRTUAL STORAGE PLATFORM F SERIES FAMILY MATRIX

CERN s Business Computing

Globus Online and HPSS. KEK, Tsukuba Japan October 16 20, 2017 Guangwei Che

Chapter 14: Mass-Storage Systems. Disk Structure

5.4 - DAOS Demonstration and Benchmark Report

I/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)

Atlas event data model optimization studies based on the use of segmented VArray in Objectivity/DB.

Chapter 14: Mass-Storage Systems

Storing Data: Disks and Files. Storing and Retrieving Data. Why Not Store Everything in Main Memory? Database Management Systems need to:

Bitmap Indices for Fast End-User Physics Analysis in ROOT

CS3600 SYSTEMS AND NETWORKS

Charged Particle Reconstruction in HIC Detectors

What is a file system

D0 Grid: CCIN2P3 at Lyon

Storing and Retrieving Data. Storing Data: Disks and Files. Solution 1: Techniques for making disks faster. Disks. Why Not Store Everything in Tapes?

Storing Data: Disks and Files. Storing and Retrieving Data. Why Not Store Everything in Main Memory? Chapter 7

Direct photon measurements in ALICE. Alexis Mas for the ALICE collaboration

GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations

Lustre/HSM Binding. Aurélien Degrémont Aurélien Degrémont LUG April

HITACHI VIRTUAL STORAGE PLATFORM G SERIES FAMILY MATRIX

Storing and Retrieving Data. Storing Data: Disks and Files. Solution 1: Techniques for making disks faster. Disks. Why Not Store Everything in Tapes?

Chapter 13: Mass-Storage Systems. Disk Scheduling. Disk Scheduling (Cont.) Disk Structure FCFS. Moving-Head Disk Mechanism

Chapter 13: Mass-Storage Systems. Disk Structure

t Database Computing* CJ

The ATLAS Production System

Ref: Chap 12. Secondary Storage and I/O Systems. Applied Operating System Concepts 12.1

An Introduction to I/O and Storage Tuning. Randy Kreiser Senior MTS Consulting Engineer, SGI

Improving Performance using the LINUX IO Scheduler Shaun de Witt STFC ISGC2016

Oracle Platform Performance Baseline Oracle 12c on Hitachi VSP G1000. Benchmark Report December 2014

I/O & Storage. Jin-Soo Kim ( Computer Systems Laboratory Sungkyunkwan University

CS433 Homework 6. Problem 1 [15 points] Assigned on 11/28/2017 Due in class on 12/12/2017

BaBar Computing: Technologies and Costs

Rectangular Gamma regions. Gamma subspaces. Slices of a Gamma region

Strategies for Processing ad hoc Queries on Large Data Warehouses

Chapter 12: Mass-Storage

PoS(Baldin ISHEPP XXII)134

CISC 7310X. C11: Mass Storage. Hui Chen Department of Computer & Information Science CUNY Brooklyn College. 4/19/2018 CUNY Brooklyn College

Operational Aspects of Dealing with the Large BaBar Data Set

ATLAS PILE-UP AND OVERLAY SIMULATION

Implementing a Data Publishing Service via DSpace. Jon W. Dunn, Randall Floyd, Garett Montanez, Kurt Seiffert

Transcription:

Access Coordination of Tertiary Storage for High Energy Physics Applications Luis Bernardo,, Arie Shoshani, Alex Sim,, Henrik Nordberg Scientific Data Management Group Computing Science Directorate Lawrence Berkeley National Laboratory

Outline Short High Energy Physics overview (of data handling problem) Description of the Storage Coordination System File tracking Tertiary storage coordination queuing file transfers file queue management File clustering parameter Transfer rate estimation Query estimation - total time Error handling

Optimizing Storage Management for High Energy Physics Applications Data Volumes for planned HENP experiments Collaboration # members /institutions Date of first data # events/year total data volume/year- TB STAR 350/35 2000 10 7-10 8 300 PHENIX 350/35 2000 10 9 600 BABAR 300/30 1999 10 9 80 CLAS 200/40 1997 10 10 300 ATLAS 1200/140 2004 10 9 2000 STAR: Solenoidal Tracker At RHIC RHIC: Relativistic Heavy Ion Collider

Physical Layout of the Detector

Result of Particle Collision (event)

Typical Scientific Exploration Process Generate large amounts of raw data large simulations collect from experiments Post-processing of data analyze data (find particles produced, tracks) generate summary data e.g. momentum, no. of pions, transverse energy Number of properties is large (50-100) Analyze data use summary data as guide extract subsets from the large dataset Need to access events based on partial properties specification (range queries) e.g. ((0.1 < AVpT < 0.2) ^ (10 < Np < 20)) v (N > 6000) apply analysis code

Size of Data and Access Patterns STAR experiment 10 8 events over 3 years 1-10 MB per event: reconstructed data events organized into 0.1-1 GB files 10 15 total size 10 6 files, ~30,000 tapes (30 GB tapes) Access patterns Subsets of events are selected by region in high-dimensional property space for analysis 10,000-50,000 out of total of 10 8 Data is randomly scattered all over the tapes Goal: Optimize access from tape systems

EXAMPLE OF EVENT PROPERTY VALUES I event 1 I N(1) 9965 I N(2) 1192 I N(3) 1704 I Npip(1) 2443 I Npip(2) 551 I Npip(3) 426 I Npim(1) 2480 I Npim(2) 541 I Npim(3) 382 I Nkp(1) 229 I Nkp(2) 30 I Nkp(3) 50 I Nkm(1) 209 I Nkm(2) 23 I Nkm(3) 32 I Np(1) 255 I Np(2) 34 I Np(3) 24 I Npbar(1) 94 I Npbar(2) 12 I Npbar(3) 24 I NSEC(1) 15607 I NSEC(2) 1342 I NSECpip(1) 638 I NSECpip(2) 191 I NSECpim(1) 728 I NSECpim(2) 206 I NSECkp(1) 3 I NSECkp(2) 0 I NSECkm(1) 0 I NSECkm(2) 0 I NSECp(1) 524 I NSECp(2) 244 I NSECpbar(1) 41 I NSECpbar(2) 8 R AVpT(1) 0.325951 R AVpT(2) 0.402098 R AVpTpip(1) 0.300771 R AVpTpip(2) 0.379093 R AVpTpim(1) 0.298997 R AVpTpim(2) 0.375859 R AVpTkp(1) 0.421875 R AVpTkp(2) 0.564385 R AVpTkm(1) 0.435554 R AVpTkm(2) 0.663398 R AVpTp(1) 0.651253 R AVpTp(2) 0.777526 R AVpTpbar(1) 0.399824 R AVpTpbar(2) 0.690237 I NHIGHpT(1) 205 I NHIGHpT(2) 7 I NHIGHpT(3) 1 I NHIGHpT(4) 0 I NHIGHpT(5) 0 54 Properties, as many as 10 8 events

Opportunities for optimization Prevent / eliminate unwanted queries => query estimation (fast estimation index) Read only events qualified for a query from a file (avoid reading irrelevant events) => exact index over all properties Share files brought into cache by multiple queries => look ahead for files needed and cache management Read files from same tape when possible => coordinating file access from tape

The Storage Access Coordination System (STACS) User s Application Query estimation / execution requests Query Estimator (QE) Bit- Sliced index open, read, close Disk Cache Query Monitor (QM) Caching Policy Module file caching file purging file caching request Cache Manager (CM) File Catalog (FC)

File Tracking (1)

File Tracking (2)

Queuing File Transfers Number of PFTPs to HPSS are limited limit set by a parameter - NoPFTP parameter can be changed dynamically CM is multi-threaded issues and monitors multiple PFTPs in parallel All requests beyond PFTP limit are queued File Catalog used to provide for each file HPSS path/file_name Disk cache path/file_name File size tape ID

File Queue Management Goal minimize tape mounts still respect the order of requests do not postpone unpopular tapes forever File clustering parameter - FCP F(T i ) 4 If the file at top of queue is in Tape i and FCP > 1 (e.g. 5) then up to 4 files F(T i ) from Tape i will be selected to be F(T i ) transferred next then, go back to file at top of queue Parameter can be set dynamically F(T i ) 3 2 5 1

File Caching Order for different File Clustering Parameters File Clustering Parameter = 1 File Clustering Parameter = 10

Transfer Rate (Tr) Estimates Need Tr to estimate total time of a query Tr is average over recent file transfers from the time PFTP request is made to the time transfer completes. This includes: mount time, seek time, read to HPSS Raid, transfer to local cache over network For dynamic network speed estimate check total bytes for all file being transferred over small intervals (e.g. 15 sec) calculate moving average over n intervals (e.g. 10 intervals) Using this, actual time in HPSS can be estimated

Dynamic Display of Various Measurements

Query Estimate Given: transfer rate Tr. Given a query for which: X files are in cache Y files are in the queue Z files are not scheduled yet Let s(file_set) be the total byte size of T all files in file_set If Z = 0, then QuEst = s(y )/Tr If Z = 0, then QuEst = (s(t)+ q.s(z))/tr where q is the number of active queries F(Y) F(Y) F(Y) F(Y) Y

Reason for q.s(z) 20 Queries of length ~20 minutes launched 20 minutes apart Estimate pretty close 20 Queries of length ~20 minutes launched 5 minutes apart Estimate bad - request accumulate in queue

Error Handling 5 generic errors file not found return error to caller limit PFTP reached can t login re-queue request, try later (1-2 min) HPSS error (I/O, device busy) remove part of file from cache, re-queue try n times (e.g. 3), then return error transfer_failed HPSS down re-queue request, try repeatedly till successful respond to File_status request with HPSS_down

Summary HPSS Resource Manager insulates applications from transient HPSS and network errors limits concurrent PFTPs to HPSS manages queue to minimize tape mounts provides file/query time estimates handles errors in a generic way Same API can be used for any MSS, such as Unitree, Enstore, etc.