The HTCondor CacheD. Derek Weitzel, Brian Bockelman University of Nebraska Lincoln

Similar documents
Distributed Caching Using the HTCondor CacheD

BOSCO Architecture. Derek Weitzel University of Nebraska Lincoln

BOSCO Architecture. Derek Weitzel University of Nebraska Lincoln

Building Campus HTC Sharing Infrastructures. Derek Weitzel University of Nebraska Lincoln (Open Science Grid Hat)

Enabling Distributed Scientific Computing on the Campus

Outline. ASP 2012 Grid School

glideinwms architecture by Igor Sfiligoi, Jeff Dost (UCSD)

High Throughput WAN Data Transfer with Hadoop-based Storage

Pegasus WMS Automated Data Management in Shared and Nonshared Environments

Shooting for the sky: Testing the limits of condor. HTCondor Week May 2015 Edgar Fajardo On behalf of OSG Software and Technology

Introducing the HTCondor-CE

Flying HTCondor at 100gbps Over the Golden State

Look What I Can Do: Unorthodox Uses of HTCondor in the Open Science Grid

glideinwms: Quick Facts

glideinwms Training Glidein Internals How they work and why by Igor Sfiligoi, Jeff Dost (UCSD) glideinwms Training Glidein internals 1

One Pool To Rule Them All The CMS HTCondor/glideinWMS Global Pool. D. Mason for CMS Software & Computing

Improving User Accounting and Isolation with Linux Kernel Features. Brian Bockelman Condor Week 2011

TreeSearch User Guide

A Virtual Comet. HTCondor Week 2017 May Edgar Fajardo On behalf of OSG Software and Technology

Today s Agenda. Today s Agenda 9/8/17. Networking and Messaging

Managing large-scale workflows with Pegasus

CHTC Policy and Configuration. Greg Thain HTCondor Week 2017

Care and Feeding of HTCondor Cluster. Steven Timm European HTCondor Site Admins Meeting 8 December 2014

HTCondor: Virtualization (without Virtual Machines)

TECHNICAL OVERVIEW OF NEW AND IMPROVED FEATURES OF EMC ISILON ONEFS 7.1.1

University of Waterloo Midterm Examination Sample Solution

Introduction to SciTokens

New Directions and BNL

XtreemFS a case for object-based storage in Grid data management. Jan Stender, Zuse Institute Berlin

The Compact Muon Solenoid Experiment. Conference Report. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland

The SciTokens Authorization Model: JSON Web Tokens & OAuth

Lecture 4, 04/08/2015. Scribed by Eric Lax, Andreas Santucci, Charles Zheng.

Google File System 2

arxiv: v1 [cs.dc] 7 Apr 2014

An update on the scalability limits of the Condor batch system

LHCb experience running jobs in virtual machines

HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide

SNAG: SDN-managed Network Architecture for GridFTP Transfers

An overview of batch processing. 1-June-2017

On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers

Factory Ops Site Debugging

VC3. Virtual Clusters for Community Computation. DOE NGNS PI Meeting September 27-28, 2017

HTCONDOR USER TUTORIAL. Greg Thain Center for High Throughput Computing University of Wisconsin Madison

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

HIGH ENERGY PHYSICS ON THE OSG. Brian Bockelman CCL Workshop, 2016

Systematic Cooperation in P2P Grids

Note: Who is Dr. Who? You may notice that YARN says you are logged in as dr.who. This is what is displayed when user

CSE 124: Networked Services Lecture-17

RIGHTNOW A C E

Interfacing HTCondor-CE with OpenStack: technical questions

Mixing and matching virtual and physical HPC clusters. Paolo Anedda

Distributed Filesystem

Building a Virtualized Desktop Grid. Eric Sedore

Filesystems Overview

Abstract. 1. Introduction. 2. Design and Implementation Master Chunkserver

RESEARCH DATA DEPOT AT PURDUE UNIVERSITY

Running HEP Workloads on Distributed Clouds

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

The Google File System (GFS)

Nexsan Storage Optimizes Virtual Operating Environment for The Clark Enersen Partners

Day 9: Introduction to CHTC

Getting Started 1. Getting Started. Date of Publish:

Magento Survey Extension User Guide

CAVA: Exploring Memory Locality for Big Data Analytics in Virtualized Clusters

Primer for Site Debugging

Scientific Cluster Deployment and Recovery Using puppet to simplify cluster management

White Paper: Clustering of Servers in ABBYY FlexiCapture

DIRAC pilot framework and the DIRAC Workload Management System

Best practices to achieve optimal memory allocation and remote desktop user experience

Scaling Indexer Clustering

Bootstrapping a (New?) LHC Data Transfer Ecosystem

Optimizing and Managing File Storage in Windows Environments

Changing landscape of computing at BNL

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897

Lecture 14: Memory Hierarchy. James C. Hoe Department of ECE Carnegie Mellon University

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

an Object-Based File System for Large-Scale Federated IT Infrastructures

UK Tier-2 site evolution for ATLAS. Alastair Dewhurst

Sizing Lotus Domino before Software upgrades Saqib N. Syed BMC Software, Inc.

Taming Metadata Storms in Parallel Filesystems with MetaFS. Tim Shaffer

ECE 574 Cluster Computing Lecture 4

ARC integration for CMS

EGEE and Interoperation

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 27, FALL 2012

Camdoop Exploiting In-network Aggregation for Big Data Applications Paolo Costa

PRESENTATION TITLE GOES HERE

Big data systems 12/8/17

CTX-1259AI Citrix Presentation Server 4.5: Administration

The Google File System

Evaluation Report: HP StoreFabric SN1000E 16Gb Fibre Channel HBA

ExtremeSoft. Users Guide. Recharge Card Printing Software. Version 2.3

The Google File System

Setup InstrucIons. If you need help with the setup, please put a red sicky note at the top of your laptop.

Effect of memory latency

I- REQUIREMENTS. A- Main

Developing Microsoft Azure Solutions (70-532) Syllabus

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

PS2 out today. Lab 2 out today. Lab 1 due today - how was it?

A New Metric for Analyzing Storage System Performance Under Varied Workloads

Transcription:

The HTCondor CacheD Derek Weitzel, Brian Bockelman University of Nebraska Lincoln

Today s Talk Today s talk summarizes work for my a part of my PhD Dissertation Also, this work has been accepted to PDPTA 15

HTCondor CacheD The CacheD puts an atomic file cache as a first class citizen in HTCondor, next to jobs. A cache can be scheduled be evicted have requirements stored on worker nodes

HTCondor CacheD Policy language for (proactive-)replication preferences Ordered preferences for transfer method Bittorrent or Condor File Transfer CacheD parenting.

Replication Policy Language HTCondor CacheD allows you to specify a policy language to match against other CacheDs. When creating the cache, you can say: Replicate the cache on nodes with > X cores, and with > Y disk space Proactive replication can replicate to nodes while other jobs are running saving time when your job begins.

Replication Policy Language On the other hand, you don t have to do proactive replication. Can force a cache to only be replicated when requested for a job

Replication Transfer Method Each cache can have a preference for transfer method. Direct and Bittorrent are available by default. Direct Using HTCondor File Transfer. Authenticated and encrypted. Bittorrent Using Bittorrent protocol via libtorrent

Cache Protocol 1. Filetransfer requests local replication 2. Local CacheD sends request to it s parent, or origin 3. Local CacheD downloads the cache. 4. Filetransfer plugin checks status of local cache 5. Upon local cache replication, plugin downloads the cache File Transfer Plugin Request Cache Replication Wait..... Request Cache Replication Notice of replication complete Download Request Symlink Creation Node Local CacheD Download Cache Request Cache Replication Notice of replication complete Origin Cached

Once on the node If the job slot belongs to a CacheD, then it will symlink the cache into the execution directory. In the glidein case, a symlink cannot be used between job slots on the same node. Instead, the CacheD Direct copies the cache from the node local parent CacheD. Slot s local CacheD copies into job s execute directory.

Parenting Each cache on a CacheD has a parent. Either it is the CacheD s parent, or the origin. User Submit Server Cache Origin Cluster A CacheD Cluster CacheD Cluster A CacheD Cluster A CacheD Cluster A Worker CacheD Nodes CacheD Cluster A CacheD Cluster A CacheD Cluster A Worker CacheD Nodes CacheD Worker Node CacheDs

Parenting Bittorrent causes extremely high IO load on a node. Every CacheD can parent to a node local CacheD so only one will be using Bittorrent at a time. All children CacheD s will copy the cache with Direct transfer method.

Evaluation Since this is for my PhD on campus computing, we first evaluated the CacheD against a local campus cluster. Submitted a BLAST benchmark using the NR database. Looked at Stage-in without cache, and with cache.

Stage-in No Cache Measures the transfer speed of Bittorrent vs. Direct HTCondor file transfers. Evaluated against number of distinct nodes downloading a copy of the cache. Origin server for the tests has a 1Gbps connection.

Stage-in No Cache Measured time required to stage-in 15GB of the NR database vs. number of distinct nodes. Average Stage in Time vs. Number of Distinct Nodes Stage in Time (seconds) 6000 4000 2000 Transfer Method BitTorrent Direct 20 30 40 50 60 70 Number of Distinct Nodes

Stage-in No Cache The Direct method had a linear increase in average stage-in time. Average Stage in Time vs. Number of Distinct Nodes Stage in Time (seconds) 6000 4000 2000 Transfer Method BitTorrent Direct 20 30 40 50 60 70 Number of Distinct Nodes

Stage-in No Cache Bittorrent had very small, if any, increase in transfer time as the number of distinct nodes increases. Average Stage in Time vs. Number of Distinct Nodes Stage in Time (seconds) 6000 4000 2000 Transfer Method BitTorrent Direct 20 30 40 50 60 70 Number of Distinct Nodes

Stage-in Cached Ran varying number of jobs with same number of available nodes (50). Measure the total stage-in time for the cache. Once the cache is stored on the node, replications are quick.

Stage-in Cached Noticeable curve at 50 subsequent jobs have 0 stage-in time. 300 Stage in Time vs. Number of Jobs Stage in Time (hours) 200 100 Transfer Method Bittorrent Direct HTCondor File Transfer 0 0 50 100 150 200 Number of Jobs

Stage-in Cached Again, Bittorrent is faster than Direct and HTCondor file transfer method. 300 Stage in Time vs. Number of Jobs Stage in Time (hours) 200 100 Transfer Method Bittorrent Direct HTCondor File Transfer 0 0 50 100 150 200 Number of Jobs

Stage-in Cached Caching significantly decreases stage-in time when the there is a cache hit. 300 Stage in Time vs. Number of Jobs Stage in Time (hours) 200 100 Transfer Method Bittorrent Direct HTCondor File Transfer 0 0 50 100 150 200 Number of Jobs

Additional Observations Bittorrent multiplies the available bandwidth to a cache origin server.

Additional Observations Bittorrent multiplies the available bandwidth to a cache origin server Even on the OSG

Conclusions Bittorrent is a very efficient method for transferring common files on a cluster. The CacheD is a flexible agent to store common files for reuse. Currently running tests on the effectiveness of using the CacheD on the Open Science Grid.