New Directions and BNL

Similar documents
Changing landscape of computing at BNL

Monitoring and Analytics With HTCondor Data

Teraflops of Jupyter: A Notebook Based Analysis Portal at BNL

What s new in HTCondor? What s coming? HTCondor Week 2018 Madison, WI -- May 22, 2018

HTCondor with KRB/AFS Setup and first experiences on the DESY interactive batch farm

RESEARCH DATA DEPOT AT PURDUE UNIVERSITY

Building Campus HTC Sharing Infrastructures. Derek Weitzel University of Nebraska Lincoln (Open Science Grid Hat)

Programming Models MapReduce

The HTCondor CacheD. Derek Weitzel, Brian Bockelman University of Nebraska Lincoln

A Practical Guide to Cost-Effective Disaster Recovery Planning

On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers

CHTC Policy and Configuration. Greg Thain HTCondor Week 2017

Current Status of the Ceph Based Storage Systems at the RACF

Care and Feeding of HTCondor Cluster. Steven Timm European HTCondor Site Admins Meeting 8 December 2014

BOSCO Architecture. Derek Weitzel University of Nebraska Lincoln

Shooting for the sky: Testing the limits of condor. HTCondor Week May 2015 Edgar Fajardo On behalf of OSG Software and Technology

ALICE Grid Activities in US

HTCondor on Titan. Wisconsin IceCube Particle Astrophysics Center. Vladimir Brik. HTCondor Week May 2018

Cloud Computing. Summary

HTCondor Week 2015: Implementing an HTCondor service at CERN

One Pool To Rule Them All The CMS HTCondor/glideinWMS Global Pool. D. Mason for CMS Software & Computing

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

High Performance Computing (HPC) Data Center Proposal

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)

Batch Services at CERN: Status and Future Evolution

Tutorial 4: Condor. John Watt, National e-science Centre

PROOF-Condor integration for ATLAS

Data oriented job submission scheme for the PHENIX user analysis in CCJ

Data oriented job submission scheme for the PHENIX user analysis in CCJ

Clouds in High Energy Physics

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

Veritas Storage Foundation for Oracle RAC from Symantec

Automated Storage Tiering on Infortrend s ESVA Storage Systems

Interfacing HTCondor-CE with OpenStack: technical questions

Condor and BOINC. Distributed and Volunteer Computing. Presented by Adam Bazinet

Matchmaker Policies: Users and Groups HTCondor Week, Madison 2016

Day 9: Introduction to CHTC

GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations

How to run applications on Aziz supercomputer. Mohammad Rafi System Administrator Fujitsu Technology Solutions

Cloud Computing CS

Maximizing Memory Performance for ANSYS Simulations

IBM TS7700 grid solutions for business continuity

How to Use a Supercomputer - A Boot Camp

An update on the scalability limits of the Condor batch system

Bright Cluster Manager Advanced HPC cluster management made easy. Martijn de Vries CTO Bright Computing

PBS PROFESSIONAL VS. MICROSOFT HPC PACK

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju

Introducing the HTCondor-CE

ECE 574 Cluster Computing Lecture 4

CERN: LSF and HTCondor Batch Services

Grid Computing Competence Center Large Scale Computing Infrastructures (MINF 4526 HS2011)

Bringing OpenStack to the Enterprise. An enterprise-class solution ensures you get the required performance, reliability, and security

glideinwms UCSD Condor tunning by Igor Sfiligoi (UCSD) UCSD Jan 18th 2012 Condor Tunning 1

The Global Grid and the Local Analysis

From an open storage solution to a clustered NAS appliance

HTCondor overview. by Igor Sfiligoi, Jeff Dost (UCSD)

SCITAS. Scientific IT and Application support unit at EPFL. created in February today : 5 system engineers + 6 application experts SCITAS

Considerations for a grid-based Physics Analysis Facility. Dietrich Liko

Experiences in Clustering CIFS for IBM Scale Out Network Attached Storage (SONAS)

Workload management at KEK/CRC -- status and plan

The rcuda technology: an inexpensive way to improve the performance of GPU-based clusters Federico Silla

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

Optimizing Parallel Access to the BaBar Database System Using CORBA Servers

The Fermilab HEPCloud Facility: Adding 60,000 Cores for Science! Burt Holzman, for the Fermilab HEPCloud Team HTCondor Week 2016 May 19, 2016

Condor-G: HTCondor for grid submission. Jaime Frey (UW-Madison), Jeff Dost (UCSD)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0)

WMS overview and Proposal for Job Status

Online data storage service strategy for the CERN computer Centre G. Cancio, D. Duellmann, M. Lamanna, A. Pace CERN, Geneva, Switzerland

Flying HTCondor at 100gbps Over the Golden State

A Virtual Comet. HTCondor Week 2017 May Edgar Fajardo On behalf of OSG Software and Technology

Overview of ATLAS PanDA Workload Management

HPC Considerations for Scalable Multidiscipline CAE Applications on Conventional Linux Platforms. Author: Correspondence: ABSTRACT:

Making Workstations a Friendly Environment for Batch Jobs. Miron Livny Mike Litzkow

DESY site report. HEPiX Spring 2016 at DESY. Yves Kemp, Peter van der Reest. Zeuthen,

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

INTRODUCING VERITAS BACKUP EXEC SUITE

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

The rcuda middleware and applications

Cluster Network Products

Physics Computing at CERN. Helge Meinhard CERN, IT Department OpenLab Student Lecture 27 July 2010

Things you may not know about HTCondor. John (TJ) Knoeller Condor Week 2017

Bright Cluster Manager

EGEE and Interoperation

Liberate, a component-based service orientated reporting architecture

(HT)Condor - Past and Future

Configuration changes such as conversion from a single instance to RAC, ASM, etc.

Operating Systems. Deadlock. Lecturer: William Fornaciari Politecnico di Milano Homel.deib.polimi.

Commvault with Cohesity Data Platform

Maximum Performance. How to get it and how to avoid pitfalls. Christoph Lameter, PhD

Enabling Distributed Scientific Computing on the Campus

Improving User Accounting and Isolation with Linux Kernel Features. Brian Bockelman Condor Week 2011

Linux Automation.

Data Sheet: Storage Management Veritas Storage Foundation for Oracle RAC from Symantec Manageability and availability for Oracle RAC databases

What s new in HTCondor? What s coming? European HTCondor Workshop June 8, 2017

Veritas InfoScale Enterprise for Oracle Real Application Clusters (RAC)

FIVE REASONS YOU SHOULD RUN CONTAINERS ON BARE METAL, NOT VMS

First Principles Vulnerability Assessment. Motivation

glideinwms architecture by Igor Sfiligoi, Jeff Dost (UCSD)

X Grid Engine. Where X stands for Oracle Univa Open Son of more to come...?!?

Veeam with Cohesity Data Platform

Transcription:

New Directions and HTCondor @ BNL USATLAS TIER-3 & NEW COMPUTING DIRECTIVES William Strecker-Kellogg RHIC/ATLAS Computing Facility (RACF) Brookhaven National Lab May 2016

RACF Overview 2 RHIC Collider at BNL USATLAS T1 STAR+PHENIX detectors ~15kCores ~15000 slots each ~11Pb Replicated dcache Disk ~6Pb central Disk (GPFS/NFS) ~20Pb HPSS Tape ~16Pb Distributed Disk (dcache/xrootd) ~60Pb HPSS Tape 3 Large HTCondor clusters Share resources via flocking Several Smaller clusters Recent 8.28.4 update

ATLAS Load Balancing 3 Working steadily since last year's talk, and HEPIX Fall 2015 talk Allows occupancy to remain at 95% or above despite dynamically changing workload with no human intervention Prevents starvation due to competition with larger jobs Only inefficiency is due to (de)fragmentation

V8.4 Bug Fixing 4 Minor issue with classad not appearing in the negotation context RemoteGroupResourcesInUse In context of group-based preemption Fixed quicky by Greg Ticket #5593 Major issue with Schedd halting for up to an hour No disk/network IO of any kind No syscalls GDB shows a mess (STL) Ticket #5648

V8.4 Schedd Bug 5 Schedd spending up to an hour recomputing internal array using autocluster jobid and the reverse mapping Jobs would die after their shadows couldn't talk to their schedd that went dark After day of debugging, code fix was implemented Built & tested at BNL, max time reduced from 1h to 2s Thank you to Todd & TJ Still suspect something off in our environment (500k ac/day) Not a problem anymore!

USATLAS Tier-3 @ BNL 6 Consolidates previously scattered Tier-3 facilities Shared resource ~1000 cores Many user-groups represented Local submission Hierarchical Group Quotas Group-Membership authorization problem Surplus sharing Group-based fair-share preemption After slow start, increased usage in past few months

Group Membership 7 Extended group-quota editing web UI Added user-institute-group mappings Cron generates a config fragment that asserts Owner Group in START expression Require group at submission Currently 74 users and 26 groups use the T3 Surplus sharing & group-respecting preemption RemoteGroupResourcesInUse

Preemptable Partitionable Slots 8 Users are frequently asking to run high-memory jobs which motivates: User-prio preemption with partitionable slots Absolutely need to support group-constrained fair-share with preemption Currently Pslot Preemption operates on entire slot Entire group's quota can fit on one 40-core machine A function of the small scale of having 30+ groups sharing 1000 cores No way to respect group quotas as schedd splits slots Not Currently Possible!

9 New Directions WHAT WE TALK ABOUT WHEN WE TALK ABOUT H[TP]C

New Science & Evolving Needs 10 Traditional Model RACF as a microcosm of HEP/NP computing Many other facilities of a similar (within an order of magnitude) scale and capability Embarrassingly parallel workloads Data storage as large or larger a problem than computing Batch is simple provisioning/matchmaking vs. scheduling Large, persistent, well-staffed experiments New Model New Users with traditionally HPC-based workloads National Synchrotron Lightsource II Center for Functional Nanomaterials Revolving userbase No institutional repository of computing knowledge Not used to large-scale computing Software support not well-defined Large pool of poorly supported open source or free/abandon-ware Commercial or GUI-Interactive

Institutional Cluster 11 New cluster of 108 nodes, each with 2xGPUs Plans to increase next year by a factor of 2, then perhaps more Infiniband interconnect in fat-tree topology SLURM is being evaluated Seems to be the growing choice for new HPC clusters Active Development DOE experience Open Source Large userbase (6 of top 10 of TOP500) Test sample workloads in proposed queue configurations Set up Shifter for docker intergration Will be run as a traditional HPC cluster MPI support an important consideration

New Science & Evolving Needs 12 Lab-management support for consolidation of computing Computational Science Initiative (CSI) at BNL leverages experience at RACF in support of other non-hep/np science domains CSI is also organizing software support to fill a gap in current BNL computational services The $10,000 question: Can we leverage existing infrastructure?

Running at HTC Facility 13 Zero-Order Requirements Embarrassingly Parallel (small) Input (one) Process (small) Output No communication of intermediate results X86_64 Other hardware not standard in the community Data accessible May seem obvious, but need adequate bandwidth to get data to the compute and back Something to think about of moving from single desktop

Running at HTC Facility 14 First-Order Requirements Linux (RedHat) Virtualization is an extra complexity, Windows expensive Containers / Docker allows simple cross-linux compatibility Free Software Instance-limited licenses are hard to control across many pools Cost of licenses becomes prohibitive with exponential computing growth Friendly resource profile Code runs not just within the machine, but within the general limits its neighboring jobs use

HTC/HPC Divide 15 Not a false dichotomy, but surely an increasingly blurry line Several users in our experience fit in middle-ground (albeit with considerable help from the RACF to fit workload into an exclusive HTC environment) 1. Biology Group: 800 cores for 5 months simple dedicated scheduler 2. Wisconsin group at CFN: successfully ran opportunistically on RHIC resources Key factors How much state transfer and with what IO patterns? Size 10 years to now: 2 racks collapse into 1 machine How many problems fit inside one machine today?

Scheduling 16 HTCondor recently can submit to SLURM via grid universe Different sharing models 1. Condor-as-SLURM-job (glidein-style) 2. Coexist, mutually exclude via policy 3. Flocking/Routing (needs work for our users) Ideal: transparent for users who know the requirements of their workload

17 The End THANK YOU FOR YOUR TIME! QUESTIONS? COMMENTS?