CouchDB-based system for data management in a Grid environment Implementation and Experience

Similar documents
CMS users data management service integration and first experiences with its NoSQL data storage

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

Virtualizing a Batch. University Grid Center

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

Storage Resource Sharing with CASTOR.

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

CERN openlab II. CERN openlab and. Sverre Jarp CERN openlab CTO 16 September 2008

Storage and I/O requirements of the LHC experiments

Overview. About CERN 2 / 11

Data Transfers Between LHC Grid Sites Dorian Kcira

The CMS Computing Model

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

Andrea Sciabà CERN, Switzerland

Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW

Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference FINDING THE HIGGS IN THE HAYSTACK(S)

Monitoring system for geographically distributed datacenters based on Openstack. Gioacchino Vino

ATLAS Experiment and GCE

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

CMS Computing Model with Focus on German Tier1 Activities

PROOF-Condor integration for ATLAS

System upgrade and future perspective for the operation of Tokyo Tier2 center. T. Nakamura, T. Mashimo, N. Matsui, H. Sakamoto and I.

CernVM-FS beyond LHC computing

AMGA metadata catalogue system

CERN s Business Computing

RADU POPESCU IMPROVING THE WRITE SCALABILITY OF THE CERNVM FILE SYSTEM WITH ERLANG/OTP

New strategies of the LHC experiments to meet the computing requirements of the HL-LHC era

Data Management for the World s Largest Machine

Modeling and Validating Time, Buffering, and Utilization of a Large-Scale, Real-Time Data Acquisition System

Ideas for evolution of replication CERN

Evolution of Database Replication Technologies for WLCG

CSCS CERN videoconference CFD applications

STATUS OF PLANS TO USE CONTAINERS IN THE WORLDWIDE LHC COMPUTING GRID

Past. Inputs: Various ATLAS ADC weekly s Next: TIM wkshop in Glasgow, 6-10/06

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM

The ATLAS EventIndex: Full chain deployment and first operation

HammerCloud: A Stress Testing System for Distributed Analysis

Batch Services at CERN: Status and Future Evolution

Summary of the LHC Computing Review

MSG: An Overview of a Messaging System for the Grid

The LHC Computing Grid

Analytics Platform for ATLAS Computing Services

LHCb Computing Resources: 2018 requests and preview of 2019 requests

CERN IT-DB Services: Deployment, Status and Outlook. Luca Canali, CERN Gaia DB Workshop, Versoix, March 15 th, 2011

Deferred High Level Trigger in LHCb: A Boost to CPU Resource Utilization

Developing Microsoft Azure Solutions

Ivane Javakhishvili Tbilisi State University High Energy Physics Institute HEPI TSU

WLCG Transfers Dashboard: a Unified Monitoring Tool for Heterogeneous Data Transfers.

The ATLAS PanDA Pilot in Operation

Deploying to the Edge CouchDB

ATLAS Distributed Computing Experience and Performance During the LHC Run-2

Monitoring of Computing Resource Use of Active Software Releases at ATLAS

Evolution of Database Replication Technologies for WLCG

Understanding StoRM: from introduction to internals

The ATLAS Tier-3 in Geneva and the Trigger Development Facility

CC-IN2P3: A High Performance Data Center for Research

Application of Virtualization Technologies & CernVM. Benedikt Hegner CERN

Federated data storage system prototype for LHC experiments and data intensive science

The Echo Project An Update. Alastair Dewhurst, Alison Packer, George Vasilakakos, Bruno Canning, Ian Johnson, Tom Byrne

System Specification

Volley: Automated Data Placement for Geo-Distributed Cloud Services

Grid Computing Activities at KIT

Constant monitoring of multi-site network connectivity at the Tokyo Tier2 center

Grid Data Management

PoS(EGICF12-EMITC2)106

ATLAS NOTE. December 4, ATLAS offline reconstruction timing improvements for run-2. The ATLAS Collaboration. Abstract

RUSSIAN DATA INTENSIVE GRID (RDIG): CURRENT STATUS AND PERSPECTIVES TOWARD NATIONAL GRID INITIATIVE

Detector Control LHC

Fault Detection using Advanced Analytics at CERN's Large Hadron Collider

CERN Tape Archive (CTA) :

Operating the Distributed NDGF Tier-1

1. Introduction. Outline

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure

ATLAS 実験コンピューティングの現状と将来 - エクサバイトへの挑戦 坂本宏 東大 ICEPP

Benchmarking the ATLAS software through the Kit Validation engine

ATLAS Oracle database applications and plans for use of the Oracle 11g enhancements

Distributed e-infrastructures for data intensive science

Improving Packet Processing Performance of a Memory- Bounded Application

The Grid: Processing the Data from the World s Largest Scientific Machine

CMS High Level Trigger Timing Measurements

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN

The CORAL Project. Dirk Düllmann for the CORAL team Open Grid Forum, Database Workshop Barcelona, 4 June 2008

First Experience with LCG. Board of Sponsors 3 rd April 2009

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Data services for LHC computing

CLOUDS OF JINR, UNIVERSITY OF SOFIA AND INRNE JOIN TOGETHER

New Persistent Back-End for the ATLAS Online Information Service

Exploring cloud storage for scien3fic research

Big Computing and the Mitchell Institute for Fundamental Physics and Astronomy. David Toback

Big Data Analytics and the LHC

Computing at the Large Hadron Collider. Frank Würthwein. Professor of Physics University of California San Diego November 15th, 2013

Towards Network Awareness in LHC Computing

Oracle at CERN CERN openlab summer students programme 2011

Investigation on Oracle GoldenGate Veridata for Data Consistency in WLCG Distributed Database Environment

LHCb Computing Status. Andrei Tsaregorodtsev CPPM

IEPSAS-Kosice: experiences in running LCG site

Benchmarking third-party-transfer protocols with the FTS

The CMS data quality monitoring software: experience and future prospects

CERN and Scientific Computing

Transcription:

CouchDB-based system for data management in a Grid environment Implementation and Experience Hassen Riahi IT/SDC, CERN

Outline Context Problematic and strategy System architecture Integration and deployment models Experience and lessons learnt 2

Who am I? Experiment distributed computing support for 7 years Working in the implementation/integration of solutions for data movement and monitoring for experiments @CERN 3

CERN and Large Hadron Collider experiments The Large Hadron Collider (LHC) is a particle accelerator It collides beams of protons at an energy of 14 TeV It has a circumference of 27km, is located 100mt underground It has four major detectors: ALICE, ATLAS, CMS, LHCb 4

WLCG: Worldwide LHC Computing Grid Online system CERN center Hassen Riahi 17 November 2014 5

Use-case: Distributed data analysis in CMS 1000 individual users per month More than 60 sites 20k jobs/hour Typically 1 file/job Files vary in size 200k completed jobs per day Minimal latencies Chaotic environment 6

Outline Context Problematic and strategy System architecture Integration and deployment models Experience and lessons learnt 7

Problematic 15% to 20% of the jobs fail and about 30 to 50% of the failures are due to the jobs not being able to upload their output data to a remote disk storage Between 5% and 10% of jobs fail in the remote copy of outputs the overall CPU loss is even higher than 5-10% since those jobs fail at the end of the processing often it results in DDoS to CMS Tier-2 storage systems AsyncStageOut (ASO) is implemented to reduce the most common failure mode of analysis jobs 8

Asynchronous stage-out strategy 9

ASO algorithm 1. The analysis jobs copy locally the outputs to the temp area of the local storage 2. The transfer requests are uploaded into ASO from a data source (Worker Nodes, Workload Management system ) 3. The ASO tool: 1. Creates, schedules and manages jobs to transfer the user files from the local storage to the target destination 2. Manages the publication of the transferred files into experiment s data catalogue 3. Updates the status of the file 4. The output is available to the user 10

Outline Context Problematic and strategy System architecture Integration and deployment models Experience and lessons learnt 11

Why CouchDB? Fast prototyping of new systems thanks to the schema-less nature of CouchDB Fast implementation of the Web monitoring No particular deployment of the monitoring is required since it is encapsulated into CouchDB Rapidly incorporate new types of data Easy communication with external tools across the CouchDB REST interface The easy replication and the integrated caching of CouchDB should provide a highly scalable and available system to face the new challenges 12

Implementation and technologies Implemented in Python as a standalone tool with modular approach Organized as set of components loosely coupled and communicating across a database Rely only on CouchDB as input and data storage Highly configurable tool: max_transfer_retry, max_files_per_transfer, data_source, Plugin-based architecture: data placement and bookkeeping Independence of Grid/Experiment technologies 13

Architecture 14

Transfer document 15

Monitoring implementation The Map/Reduce views of CouchDB are visualized across Protovis Migration to D3.js is on-going The monitoring application is encapsulated into CouchDB server as CouchApp 16

Some monitoring plots 17

Outline Context Problematic and strategy System architecture Integration and deployment models Experience and lessons learnt 18

Integration 19

Authentication/Validation The authentication with CouchDB is performed using the X509 Proxy Certificate Using custom authentication handler Document update validation: 20

Deployment models 21

Outline Context Motivations and strategy System architecture Integration and deployment models Experience and lessons learnt 22

Commissioning tests Application and CouchDB were deployed in 1 VM with 8 VCPU, 15 GB of RAM and 200 GB of disk storage Scale up to 1.5 the production load (20 k files/h - 200 k completed files/day) 300k files/day inject ~ 100k files each 8 hours 23

Commissioning experience ASO can manage load peak of more than 20k files/h without critical error or crashes Nearly 60 GB of disk storage were consumed by the CouchDB More than 90 % has been used for views caching The average CPU idle time was almost stable at 90% The RAM was always almost fully consumed during the processing time Most probably the delays seen would be reduced by increasing the RAM for accessing the cached views 24

Production results Ø ASO is in production since June 2014 More than 800 TB transferred during the last 3 months Peak of 750k files per week 25

Production environment Hardware 2 physical nodes (migration to VMs is ongoing) CouchDB: 8 Cores and 24 GB RAM ASO application: 24 Cores and 32 GB RAM ASO Database Size Average database size: 20 GB 3 Design docs: 33 views Average number of docs: 800k Opera*on Upgrade: 1 Ome/month CompacOon: 2 Omes/day 26

Problem 1: Database upgrade During this first phase of production we need to upgrade the database once per month to include new features CouchDB spends more than 24 hours for views index generation ASO is off during this operation CMS users cannot perform physics analysis for more than 1 day 27

Solution 1) Start a fresh CouchDB instance at each upgrade while keeping the old one running Requires development to support load balancing over separated couch instances Increases CouchDB operation efforts 2) Upload the new design document in a replicated database, trigger the view index generation offline and switch once it is completed 28

Problem 2: Compaction Often I/O wait time is close to 10 % in 8 cores frequent I/O bottlenecks Hassen Riahi 08 January 2014 29

Map function improvement 30

Reduce function improvement 31

Results Time for Index generaoon Total number of views Views size IniOally 28 hours 33 30 GB AVer views clean up AVer views code improvement 17 hours 25 17 GB 25 minutes 30 15 GB 32

Conclusions Fast system and Web monitoring prototyping The system has shown satisfactory performances Database operation issues are understood They are mainly addressed by views code improvement Promising technology for other applications (data analytics, data mining ) Looking forward to your feedbacks and suggestions! 33

References CERN: http://home.web.cern.ch/ CMS: http://cms.web.cern.ch/ ASO: https://github.com/dmwm/asyncstageout Protovis: http://mbostock.github.io/protovis D3js: http://d3js.org/ 34

Thank you for your attention! hassen.riahi@cern.ch 35