The Run 2 ATLAS Analysis Event Data Model

Similar documents
Early experience with the Run 2 ATLAS analysis model

ALICE ANALYSIS PRESERVATION. Mihaela Gheata DASPOS/DPHEP7 workshop

Caching and Buffering in HDF5

A new petabyte-scale data derivation framework for ATLAS

The ATLAS Distributed Analysis System

Data Analysis in ATLAS. Graeme Stewart with thanks to Attila Krasznahorkay and Johannes Elmsheuser

CMPT 117: Tutorial 1. Craig Thompson. 12 January 2009

Expressing Parallelism with ROOT

ATLAS Analysis Workshop Summary

ROOT Data Model Evolution. Lukasz Janyst, Rene Brun, Philippe Canal

AIDA-2020 Advanced European Infrastructures for Detectors at Accelerators. Scientific/Technical Note

Influence of Distributing a Tier-2 Data Storage on Physics Analysis

Toward real-time data query systems in HEP

Event cataloguing and other database applications in ATLAS

Validation of Missing ET reconstruction in xaod

The CMS data quality monitoring software: experience and future prospects

The ATLAS EventIndex: Full chain deployment and first operation

LCG Conditions Database Project

Optimizing Parallel Access to the BaBar Database System Using CORBA Servers

NanoAODs Summer student report

Access to ATLAS Geometry and Conditions Databases

pyframe Ryan Reece A light-weight Python framework for analyzing ROOT ntuples in ATLAS University of Pennsylvania

PROOF-Condor integration for ATLAS

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1

Contents 1 Introduction 2 2 Design and Implementation The BABAR Framework Requirements and Guidelines

ROOT Analysis Framework (I) Introduction. Qipeng Hu March 15 th, 2015

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11

Big Data Tools as Applied to ATLAS Event Data

New data access with HTTP/WebDAV in the ATLAS experiment

CS250 Final Review Questions

Data Analysis R&D. Jim Pivarski. February 5, Princeton University DIANA-HEP

G52CPP C++ Programming Lecture 9

Managing the Database

PyROOT: Seamless Melting of C++ and Python. Pere MATO, Danilo PIPARO on behalf of the ROOT Team

Experience with Data-flow, DQM and Analysis of TIF Data

Introduction to ROOT. Sebastian Fleischmann. 06th March 2012 Terascale Introductory School PHYSICS AT THE. University of Wuppertal TERA SCALE SCALE

Short Notes of CS201

The ATLAS Tier-3 in Geneva and the Trigger Development Facility

G52CPP C++ Programming Lecture 7. Dr Jason Atkin

CS201 - Introduction to Programming Glossary By

SystemC Configuration Tutorial A preview of the draft standard

Memory Management. Reading: Silberschatz chapter 9 Reading: Stallings. chapter 7 EEL 358

Trees can be used to store entire records from a database, serving as an in-memory representation of the collection of records in a file.

ROOT Data Model Evolution. Lukasz Janyst, Rene Brun, Philippe Canal

The OpenVX Computer Vision and Neural Network Inference

Prompt data reconstruction at the ATLAS experiment

CSE 373 Spring 2010: Midterm #1 (closed book, closed notes, NO calculators allowed)

CS 333 Introduction to Operating Systems. Class 11 Virtual Memory (1) Jonathan Walpole Computer Science Portland State University

Raima Database Manager Version 14.1 In-memory Database Engine

Database Architectures

Twelve Ways to Build CMS Crossings from ROOT Files

CSE 12 Week Eight, Lecture One

Big Data Analytics Tools. Applied to ATLAS Event Data

CPS 506 Comparative Programming Languages. Programming Language

SolidFire and Ceph Architectural Comparison

Implicit Evaluation of auto Variables and Arguments

Datacenter replication solution with quasardb

Bill Bridge. Oracle Software Architect NVM support for C Applications

Main Memory (Part II)

Considerations for a grid-based Physics Analysis Facility. Dietrich Liko

LHCb Distributed Conditions Database

ATLAS Distributed Computing Experience and Performance During the LHC Run-2

PyROOT Automatic Python bindings for ROOT. Enric Tejedor, Stefan Wunsch, Guilherme Amadio for the ROOT team ROOT. Data Analysis Framework

ATLAS NOTE. December 4, ATLAS offline reconstruction timing improvements for run-2. The ATLAS Collaboration. Abstract

Preface Introduction... 23

Programming Languages and Techniques (CIS120)

OpenACC Course. Office Hour #2 Q&A

Data Transfers Between LHC Grid Sites Dorian Kcira

Exam Questions 1Z0-895

The Google File System

Topics: Memory Management (SGG, Chapter 08) 8.1, 8.2, 8.3, 8.5, 8.6 CS 3733 Operating Systems

CACHE-OBLIVIOUS MAPS. Edward Kmett McGraw Hill Financial. Saturday, October 26, 13

CSE 374 Programming Concepts & Tools. Hal Perkins Spring 2010

Copyright 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12

Machine Learning in Data Quality Monitoring

UNDERSTANDING CLASS DEFINITIONS CITS1001

MEMORY MANAGEMENT. Jo, Heeseung

DIAL: Distributed Interactive Analysis of Large Datasets

Table of Contents. TB2009DataAnalysis...1. Introduction...2 People Involved in Analysis/Software...2 Communication...2. Data Preparation...

Monitoring of Computing Resource Use of Active Software Releases at ATLAS

An Introduction to Big Data Formats

Java Object Oriented Design. CSC207 Fall 2014

PARALLEL PROCESSING OF LARGE DATA SETS IN PARTICLE PHYSICS

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

Memory Management. Jo, Heeseung

This Unit: Main Memory. Virtual Memory. Virtual Memory. Other Uses of Virtual Memory

CSCI-1200 Data Structures Fall 2018 Lecture 22 Hash Tables, part 2 & Priority Queues, part 1

QUIZ. What is wrong with this code that uses default arguments?

Low latency & Mechanical Sympathy: Issues and solutions

In-Memory Data Management

FUNCTIONALLY OBLIVIOUS (AND SUCCINCT) Edward Kmett

CS399 New Beginnings. Jonathan Walpole

IBM Best Practices Working With Multiple CCM Applications Draft

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

An SQL-based approach to physics analysis

Dept. Of Computer Science, Colorado State University

And Even More and More C++ Fundamentals of Computer Science

SOFTWARE-DEFINED NETWORKING WHAT IT IS, AND WHY IT MATTERS

Using the In-Memory Columnar Store to Perform Real-Time Analysis of CERN Data. Maaike Limper Emil Pilecki Manuel Martín Márquez

Transcription:

The Run 2 ATLAS Analysis Event Data Model Marcin Nowak, BNL On behalf of the ATLAS Analysis Software Group and Event Store Group 16 th International workshop on Advanced Computing and Analysis Techniques in physics research (ACAT) September 2014 Prague

ACAT Sept 2014 ATLAS Run 2 EDM Marcin Nowak 2 Overview ATLAS Run 1 analysis data model The original design The actual analysis model during Run 1 Problem areas and things we wanted to improve The new model for Run 2 Design Implementation Persistency ROOT file structure Simple code example Schema evolution Performance

The ATLAS Run 1 Event Data Model Event reconstruction process produces data in the AOD format (Analysis Object Data): official ATLAS-wide event representation with reduced information for physics analysis Fully object-oriented (complex) EDM, part of Athena: ATLAS offline software framework Size = 350-400KB Persistency: object based Using a different persistent data model to be able to freely evolve the transient EDM without compromising backward compatibility Statically defined persistent object shape (schema) Persistent data format required Athena (or at least its persistency layer) to read AOD Even though the files were in ROOT format Quite a lot of libraries needed (dictionaries, converters) Frozen Tier0 policy Reconstruction fixes not part of original AOD need to be redone every time AOD reading too slow for many physicists Athena startup, object reading and AODfix overheads Majority of the users turned to intermediate data formats (DPD) Working groups started to produce their own private Derived Physics Data datasets readable directly from ROOT ACAT Sept 2014 ATLAS Run 2 EDM Marcin Nowak 3

ACAT Sept 2014 ATLAS Run 2 EDM Marcin Nowak 4 ATLAS Analysis Model During Run 1 Fix: Reconstruction fixes on top of the results from frozen central production CP tools: combined performance analysis tools for end-user analysis DPDs produced on request only delay in respect to the central AOD production Data format different than in Athena causing duplication of software tools DPD-based tools also different between groups Hard to share code and compare results Combined DPD size ~3x the size of corresponding AOD

ACAT Sept 2014 ATLAS Run 2 EDM Marcin Nowak 5 Coming up with a Better Model for Run 2 The first Long Shutdown of LHC created an opportunity to rethink and redesign the analysis data model, based on the experience from Run 1 New model design requirements: Prepare for increased data rate in Run 2 (~2x that of Run 1) Flexibility in balancing CPU vs disk space requirements Maintain the I/O performance of standalone ROOT: >1kHz Enable reading of single attributes Data directly analyzable in ROOT Reduce the latency of delivering data to the end users Make code sharing between groups and between Athena and DPDs possible Promote collaboration between groups Maintain the ability to read full AOD in Athena environment Access to calibration databases Proposal: merge AOD and DPD into a new format called xaod

ACAT Sept 2014 ATLAS Run 2 EDM Marcin Nowak 6 Introducing the xaod Format Replacement for both AOD and DPD data for Run 2 Produced as the end result of the Athena-based reconstruction Full xaod data available without delay Used as both input and output for physics group productions xaod allows reduction of content without changing the format Can be created and read in standalone ROOT Lightweight number of libraries limited to minimum Single, object-oriented API From the user point of view just like the old AOD Special implementation with respect to class data members Software tools using single common API can function in both frameworks Dynamic xaod object shape Data members added at runtime or removed during copying Single transient/persistent representation No longer fixed class shape like before No separate persistent data model Ability to read single attributes

ACAT Sept 2014 ATLAS Run 2 EDM Marcin Nowak 7 The New Analysis Model for Run 2 xaod data format delivered by central production, directly usable for analysis in Athena and ROOT (lower path) Reduction Framework producing reduced-size data samples for analysis groups (upper path) CP tools: combined performance analysis tools for end-user analysis, both in Athena and ROOT Final analysis stage done in pure ROOT, primarily on local resources

xaod Format: Basic Design xaod objects consist of an interface object and a storage container Not to be confused with class interface, more like a proxy From the user point of view, the interface object is the only visible object, but usually it does not have any data members itself Data member are stored in the storage container There is one storage container per collection of objects ATLAS collections in Athena are implemented using DataVector class Storage containers keep arrays of attributes An apparent array-of-structs is actually represented in memory as a struct-of-arrays Interesting implications for vectorization and I/O Single objects have their dedicated storage container with 1-element arrays Can be added or removed from collections ACAT Sept 2014 ATLAS Run 2 EDM Marcin Nowak 8

xaod Implementation: Data Stores xaod objects have a set of pre-defined (static) attributes Storage containers assigned to these types have data arrays to store their static attributes: class JetAuxContainer_v1 : public AuxContainerBase { public: JetAuxContainer_v1(); private: std::vector<float> pz; storage array for PZ attribute } JetAuxContainer_v1::JetAuxContainer_v1() { AUX_VARIABLE( pz ); } The constructor uses a macro to automatically register data arrays in Registry Arrays can be later looked up by their names Additional (dynamic) object attributes can be added at any time They are kept in a storage container extension that allocates storage arrays as needed Type-specific storage containers are only an optimization! Technically all different xaod types could use just the dynamic store Dynamic attributes may be selectively dropped when writing to file 3 level selection lists in Athena: by object type / name / attribute Static store may be converted to dynamic in order to drop static attributes ACAT Sept 2014 ATLAS Run 2 EDM Marcin Nowak 9

xaod Implementation: Data Store Access xaod object data is stored in the storage container The interface object uses getter and setter methods to access static attributes Accessors are provided to make these methods fast: float Jet_v1::pz() const { static Accessor<float> pz_acc( pz ); } return pz_acc(*this); void Jet_v1::setPz(float pz) { static Accessor<float> pz_acc( pz ); } pz_acc(*this) = pz; return; Accessors use attribute type and name for initialization (lookup in Registry) C++ static storage can be used to ensure the (slow) identifier lookup is done only once After initialization the accessor provides direct access to the storage array Accessors are attribute-specific, not object-specific Object they access needs to be specified for every use (still fast) Accessors for dynamic attributes can be declared anywhere in the user code Also C++ static! ACAT Sept 2014 ATLAS Run 2 EDM Marcin Nowak 10

xaod: Persistency xaod data files are created both by Athena and standalone ROOT Files coming from both sources need to be readable by ROOT and in particular allow single attribute reading Athena persistency layer had to be modified: Historically Athena used object-based I/O with fixed class schema defined in dictionaries not possible for the dynamic store! Static store single attribute reading needed tuning of ROOT split level Solution: writing xaod collections by components: Collection (DataVector) of interface objects: stored as an object Static part of the storage container: stored as an object (object-based storage requires (ROOT) class dictionary) Dynamic attributes: stored in dedicated TTree branches created as needed Storage container provides uniform API for accessing storage of both static and dynamic attribute storage For both attribute types the I/O API delivers storage array plus the type information Opens interesting options for conversion of object shape during writing Reading of dynamic attributes is implemented with a dedicated storage container Empty in the beginning, with attributes read transparently when accessed Note: dynamic attributes can make files with the same data types have different TTree structure Can be a surprise when trying to merge files! ACAT Sept 2014 ATLAS Run 2 EDM Marcin Nowak 11

xaod File in the TBrowser ACAT Sept 2014 ATLAS Run 2 EDM Marcin Nowak 12 Inspecting an xaod file produced during ATLAS Data Challenge 2014 using ROOT TBrowser InDetTrackParticles collection the interface object with no attributes

xaod File in the TBrowser (2) ACAT Sept 2014 ATLAS Run 2 EDM Marcin Nowak 13 Static attributes in the InDetTrackParticles storage container s TBranches

xaod File in the TBrowser (3) ACAT Sept 2014 ATLAS Run 2 EDM Marcin Nowak 14 InDetTrackParticles dynamic attributes in standalone TBranches

Standalone ROOT xaod Code Example Lightweight and simple access to xaod from user code in ROOT : #include xaodrootaccess/init.h #include xaodrootaccess/tevent.h #include xaodmuon/muoncontainer.h int main() { xaod::init(); TFile* file = TFile::Open( xaod.root, READ ); xaod::tevent event; event.readfrom(file); for( Long64_t entry=0; entry < event.getentries(); ++entry ) { event.getentry(entry); const xaod::muoncontainer* muons = 0; event.retrieve(muons, Muons ); } std::cout << 1 st muon pt = << muons->at(0)->pt() << std::endl; } return 0; ACAT Sept 2014 ATLAS Run 2 EDM Marcin Nowak 15

xaod: Schema Evolution In Run 1, support for schema evolution in Athena had a big impact on the persistent data format Design tailored specifically to Athena, operating on a whole object at a time Maintaining 2 separate data models and necessary converters required effort and expertise A model fitting better to reconstruction than to analysis For Run 2, the xaod is both the transient and the persistent EDM xaod objects have version number in the class name: e.g. Jet_v1 Serious changes in class schema will increase the version number Athena can read old version if the class converter support is (like in Run 1) The end user sees the class name without the version (typedef) For standalone ROOT, no support for schema evolution is foreseen Except what we can get from ROOT Always working with the current EDM, no backward compatibility Athena will continue to use its conversion layer Used in general not only for schema evolution Can be used for schema evolution but only when reading ROOT support for schema evolution is much better now than 10 years ago It s class-based, so dynamic attributes have limited schema evolution support ACAT Sept 2014 ATLAS Run 2 EDM Marcin Nowak 16

xaod: Performance Performance gains: No conversion to/from persistent EDM during I/O Data members arranged column-wise Dynamic attributes read only on-demand Potential trouble areas: Large numbers of top-level branches in the TTree One per each dynamic attribute Read-everything mode has more overhead because of the dynamic attributes Main reason for not storing all attributes in dynamic format Observed results (ATLAS Data Challenge 2014): xaod files are larger than Run 1 files by ~20% But there will be no duplication between AOD and DPD Size increase depends on the data type worst case almost 2x larger but also seen some types become smaller Difference attributed to absence of T/P converters that were compressing data In ROOT reading selected attributes >1KHz Interactive ROOT very responsive In Athena the development is still ongoing (changes to EventInfo) Not much reliable performance data yet ACAT Sept 2014 ATLAS Run 2 EDM Marcin Nowak 17

ACAT Sept 2014 ATLAS Run 2 EDM Marcin Nowak 18 Summary We implemented a new data format that allows in a flexible way to add and remove object properties at runtime In collaboration with the ROOT team We hope to use the model for vectorization The full reconstruction code was rewritten to use xaod Currently teaching the collaboration members to use the new data format giving a series of tutorials First response is positive Files can be accessed without the full ATLAS offline software The format is readable with ROOT using only ~100MB of xaod libraries ATLAS Data Challenge 2014 is under way with the new data format