Monitoring HTCondor with the BigPanDA monitoring package

Similar documents
Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

AutoPyFactory: A Scalable Flexible Pilot Factory Implementation

The ATLAS PanDA Pilot in Operation

INDEXING OF ATLAS DATA MANAGEMENT AND ANALYSIS SYSTEM

glideinwms: Quick Facts

Overview of ATLAS PanDA Workload Management

HTCondor Week 2015: Implementing an HTCondor service at CERN

The Fermilab HEPCloud Facility: Adding 60,000 Cores for Science! Burt Holzman, for the Fermilab HEPCloud Team HTCondor Week 2016 May 19, 2016

What s new in HTCondor? What s coming? HTCondor Week 2018 Madison, WI -- May 22, 2018

VC3. Virtual Clusters for Community Computation. DOE NGNS PI Meeting September 27-28, 2017

Experience with ATLAS MySQL PanDA database service

An update on the scalability limits of the Condor batch system

UW-ATLAS Experiences with Condor

Experiences with the new ATLAS Distributed Data Management System

WLCG Transfers Dashboard: a Unified Monitoring Tool for Heterogeneous Data Transfers.

Changing landscape of computing at BNL

EGEE and Interoperation

Storage Resource Sharing with CASTOR.

ATLAS Nightly Build System Upgrade

Clouds in High Energy Physics

ELFms industrialisation plans

Unifying Big Data Workloads in Apache Spark

Evolution of Cloud Computing in ATLAS

CERN: LSF and HTCondor Batch Services

On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers

High Throughput Urgent Computing

What s new in HTCondor? What s coming? European HTCondor Workshop June 8, 2017

AGIS: The ATLAS Grid Information System

Monitoring and Analytics With HTCondor Data

The ATLAS Software Installation System v2 Alessandro De Salvo Mayuko Kataoka, Arturo Sanchez Pineda,Yuri Smirnov CHEP 2015

HammerCloud: A Stress Testing System for Distributed Analysis

The PanDA System in the ATLAS Experiment

PanDA: Exascale Federation of Resources for the ATLAS Experiment

Pegasus WMS Automated Data Management in Shared and Nonshared Environments

Shooting for the sky: Testing the limits of condor. HTCondor Week May 2015 Edgar Fajardo On behalf of OSG Software and Technology

LAB EXERCISE: RedHat OpenShift with Contrail 5.0

glideinwms architecture by Igor Sfiligoi, Jeff Dost (UCSD)

Stat 602 The Design of Experiments

Clouds at other sites T2-type computing

Tier 3 batch system data locality via managed caches

The CORAL Project. Dirk Düllmann for the CORAL team Open Grid Forum, Database Workshop Barcelona, 4 June 2008

Streamlining CASTOR to manage the LHC data torrent

Improved ATLAS HammerCloud Monitoring for Local Site Administration

LCG Conditions Database Project

PROOF-Condor integration for ATLAS

Distributed API Management in a Hybrid Cloud Environment

Teraflops of Jupyter: A Notebook Based Analysis Portal at BNL

OpenShift Roadmap Enterprise Kubernetes for Developers. Clayton Coleman, Architect, OpenShift

Byte Academy. Python Fullstack

Operating the Distributed NDGF Tier-1

Towards a Strategy for Data Sciences at UW

ATLAS Oracle database applications and plans for use of the Oracle 11g enhancements

Flying HTCondor at 100gbps Over the Golden State

Orchestrating Big Data with Apache Airflow

Migration and Building of Data Centers in IBM SoftLayer

Review. Fundamentals of Website Development. Web Extensions Server side & Where is your JOB? The Department of Computer Science 11/30/2015

Red Hat 3scale 2-saas

Utilizing Databases in Grid Engine 6.0

The HTCondor CacheD. Derek Weitzel, Brian Bockelman University of Nebraska Lincoln

Managing large-scale workflows with Pegasus

New Directions and BNL

OpenStack Mitaka Release Overview

IBM Compose Managed Platform for Multiple Open Source Databases

Visita delegazione ditte italiane

NMI End-to-End Diagnostic Advisory Group BoF. Spring 2004 Internet2 Member Meeting

Monitoring for IT Services and WLCG. Alberto AIMAR CERN-IT for the MONIT Team

MCT620 Distributed Systems Module Handbook

System upgrade and future perspective for the operation of Tokyo Tier2 center. T. Nakamura, T. Mashimo, N. Matsui, H. Sakamoto and I.

INTRODUCING CISCO SECURITY FOR AWS

UK Tier-2 site evolution for ATLAS. Alastair Dewhurst

Performance and Load Testing R12 With Oracle Applications Test Suite

Batch Services at CERN: Status and Future Evolution

Trove: The OpenStack DBaaS

AT&T Flow Designer. Current Environment

Using Puppet to contextualize computing resources for ATLAS analysis on Google Compute Engine

Edinburgh (ECDF) Update

METADATA INTERCHANGE IN SERVICE BASED ARCHITECTURE

Launching StarlingX. The Journey to Drive Compute to the Edge Pilot Project Supported by the OpenStack

Introducing the HTCondor-CE

Evaluation of the TimescaleDB PostgreSQL Time Series extension ID: 17834

RobinHood Project Update

CIS-CAT Pro Dashboard Documentation

Minfy MS Workloads Use Case

Improve Web Application Performance with Zend Platform

One Pool To Rule Them All The CMS HTCondor/glideinWMS Global Pool. D. Mason for CMS Software & Computing

The ATLAS EventIndex: an event catalogue for experiments collecting large amounts of data

BIG DATA COURSE CONTENT

2013 AWS Worldwide Public Sector Summit Washington, D.C.

SLATE. Services Layer at the Edge. First Meeting of the National Research Platform Montana State University August 7-8, 2017

O Reilly RailsConf,

New data access with HTTP/WebDAV in the ATLAS experiment

Red Hat HPC Solution Overview. Platform Computing

Developing Enterprise Cloud Solutions with Azure

Data publication and discovery with Globus

Project Name. The Eclipse Integrated Computational Environment. Jay Jay Billings, ORNL Parent Project. None selected yet.

CMS users data management service integration and first experiences with its NoSQL data storage

Real Life Web Development. Joseph Paul Cohen

STATUS OF PLANS TO USE CONTAINERS IN THE WORLDWIDE LHC COMPUTING GRID

arxiv: v1 [physics.comp-ph] 29 Nov 2018

06 March INVESTOR DAY 2018 Sabre GLBL Inc. All rights reserved. 1

Transcription:

Monitoring HTCondor with the BigPanDA monitoring package J. Schovancová 1, P. Love 2, T. Miller 3, T. Tannenbaum 3, T. Wenaus 1 1 Brookhaven National Laboratory 2 Lancaster University 3 UW-Madison, Department of Computer Science HTCondor Week 2014 28 30 April 2014, University of Wisconsin, Madison, WI, USA

Introduction PanDA = Production and Distributed Analysis Start: Aug 2005, in production for US ATLAS: Dec 2005, since 2008 WMS for the whole ATLAS Collaboration T. Maeno, PanDA's Role in ATLAS Computing Model Evolution, ISGC2014 Monitoring HTCondor with the BigPanDAmon package, HTCondor Week 2014 2

Evolution: The Next Generation US DOE ASCR and HEP funded project: Next Generation Workload Management and Analysis System for Big Data, code name: BigPanDA 3 year project since September 2012 to evolve PanDA Workload Management System beyond ATLAS and LHC Factorize core Leverage intelligent networking Extend scope Monitoring and usability Monitoring HTCondor with the BigPanDAmon package, HTCondor Week 2014 3

BigPanDA Monitoring BigPanDAmon package based on Django framework Modular, easy to bring up a new project/vo Clear separation between data access and visualization Provide REST APIs to access object information Runs on top of Oracle or MySQL DB backends Documentation for developers: describes configuration and modules Deployed with RPMs Monitoring HTCondor with the BigPanDAmon package, HTCondor Week 2014 4

HTCondor Monitoring HTCondor is an essential foundation technology for PanDA the engine for PanDA's resource provisioning system (pilot factory) Integrating HTCondor monitoring into PanDA's successful monitoring system can provide value for both HTCondor monitoring has been implemented as a first application of BigPanDAmon's modular architecture Monitoring HTCondor with the BigPanDAmon package, HTCondor Week 2014 5

BigPanDA Monitoring Modules Monitoring HTCondor with the BigPanDAmon package, HTCondor Week 2014 6

Monitoring Traffic Condor sched lib plugin to send updates to PanDA mon Asynchronous w.r.t. scheduler More technical details in Todd Tannenbaum's talk Connection to PanDA API with frequency 1 Hz Bulk Update, Add, Flag deleted Avg. 2-5 records/s, Peak: 100s records/s Currently HTTPS (X509) authentication Updated information available in the monitoring Monitoring HTCondor with the BigPanDAmon package, HTCondor Week 2014 7

PanDA HTCondor REST API Bulk-operation API resource Resource /v2/api-auth/htcondor/jobs/ HTTP verb Purpose Description POST create Bulk create new HTCondor job. GET read Bulk list HTCondor jobs. PUT update Bulk update HTCondor jobs. DELETE delete Bulk flag of records which were deleted from the HTCondor. Single job API resource is not needed for now. Perhaps we will need a single-job GET later on. Nice and useful reading: http://apigee.com/about/api-best-practices/api-design/ebook Monitoring HTCondor with the BigPanDAmon package, HTCondor Week 2014 8

HTCondor Mon list jobs Monitoring HTCondor with the BigPanDAmon package, HTCondor Week 2014 9

HTCondor Mon job filter Filter by all visible columns Bookmark filtered view, send to a collaborator Monitoring HTCondor with the BigPanDAmon package, HTCondor Week 2014 10

HTCondor Mon job details Monitoring HTCondor with the BigPanDAmon package, HTCondor Week 2014 11

Monitoring Wish list HTTPS HTTP with IP/host restriction of agents Schema PanDA mon side: use classads attributes (schema-less) Condor side: minimize data translation RPCs Provide list of compulsory/available attributes in API Scalability Leverage experience with Redis Webserver performance tuning (Apache2/WSGI, nginx/gunicorn) Summaries Provide parameterized summaries/plots beside job lists Monitoring HTCondor with the BigPanDAmon package, HTCondor Week 2014 12

Summary Fruitful collaboration between PanDA and HTCondor teams! First version of the HTCondor monitor developed API v1 used on a production machine at UW cluster More updates yet to come. Monitoring HTCondor with the BigPanDAmon package, HTCondor Week 2014 13

Monitoring HTCondor with the BigPanDA monitoring package J. Schovancová 1, P. Love 2, T. Miller 3, T. Tannenbaum 3, T. Wenaus 1 1 Brookhaven National Laboratory 2 Lancaster University 3 UW-Madison, Department of Computer Science HTCondor Week 2014 28 30 April 2014, University of Wisconsin, Madison, WI, USA