glideinwms Frontend Installation

Similar documents
glideinwms architecture by Igor Sfiligoi, Jeff Dost (UCSD)

glideinwms UCSD Condor tunning by Igor Sfiligoi (UCSD) UCSD Jan 18th 2012 Condor Tunning 1

glideinwms Training Glidein Internals How they work and why by Igor Sfiligoi, Jeff Dost (UCSD) glideinwms Training Glidein internals 1

Configuring a glideinwms factory

CCB The Condor Connection Broker. Dan Bradley Condor Project CS and Physics Departments University of Wisconsin-Madison

Look What I Can Do: Unorthodox Uses of HTCondor in the Open Science Grid

One Pool To Rule Them All The CMS HTCondor/glideinWMS Global Pool. D. Mason for CMS Software & Computing

An update on the scalability limits of the Condor batch system

Introducing the HTCondor-CE

HTCondor overview. by Igor Sfiligoi, Jeff Dost (UCSD)

Shooting for the sky: Testing the limits of condor. HTCondor Week May 2015 Edgar Fajardo On behalf of OSG Software and Technology

What s new in HTCondor? What s coming? HTCondor Week 2018 Madison, WI -- May 22, 2018

Care and Feeding of HTCondor Cluster. Steven Timm European HTCondor Site Admins Meeting 8 December 2014

Networking and High Throughput Computing. Garhan Attebury HTCondor Week 2015

Flying HTCondor at 100gbps Over the Golden State

OSG Lessons Learned and Best Practices. Steven Timm, Fermilab OSG Consortium August 21, 2006 Site and Fabric Parallel Session

Tutorial 4: Condor. John Watt, National e-science Centre

Introduction to Distributed HTC and overlay systems

CMS experience of running glideinwms in High Availability mode

Installing OSG in a VirtualBox Machine

Cloud Computing. Summary

glideinwms: Quick Facts

Factory Ops Site Debugging

Singularity in CMS. Over a million containers served

Condor-G: HTCondor for grid submission. Jaime Frey (UW-Madison), Jeff Dost (UCSD)

First Principles Vulnerability Assessment. Motivation

Eurogrid: a glideinwms based portal for CDF data analysis - 19th January 2012 S. Amerio. (INFN Padova) on behalf of Eurogrid support group

Corral: A Glide-in Based Service for Resource Provisioning

Scalability and interoperability within glideinwms

Containerized Cloud Scheduling Environment

Cloud Computing with HTCondor

Edinburgh (ECDF) Update

Veeam Cloud Connect. Version 8.0. Administrator Guide

Interfacing HTCondor-CE with OpenStack: technical questions

Primer for Site Debugging

Grids and Security. Ian Neilson Grid Deployment Group CERN. TF-CSIRT London 27 Jan

Grid Compute Resources and Grid Job Management

A Virtual Comet. HTCondor Week 2017 May Edgar Fajardo On behalf of OSG Software and Technology

Securing A Basic HTCondor Pool

First evaluation of the Globus GRAM Service. Massimo Sgaravatto INFN Padova

What s new in Condor? What s c Condor Week 2010

glideinwms experience with glexec

Andrej Filipčič

NIC Chile Secondary DNS Service History and Evolution

Improvements to Configuration. John (TJ) Knoeller Condor Week 2014

Upgrade Guide. This document details the upgrade process for customers moving from the full version of OnApp Cloud v2.3.1 to v2.3.2.

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Introduction to Programming and Computing for Scientists

Singularity tests at CC-IN2P3 for Atlas

High Throughput WAN Data Transfer with Hadoop-based Storage

MongoDB Security Checklist

The EU DataGrid Fabric Management

VMware AirWatch Content Gateway for Linux. VMware Workspace ONE UEM 1811 Unified Access Gateway

Certificate Authorities: Information and Usage

GLOBUS TOOLKIT SECURITY

Frequently Asked Questions About Performance Monitor

Building Campus HTC Sharing Infrastructures. Derek Weitzel University of Nebraska Lincoln (Open Science Grid Hat)

Intellicus Cluster and Load Balancing- Linux. Version: 18.1

NGFW Security Management Center

Monitoring Primer HTCondor Week 2017 Todd Tannenbaum Center for High Throughput Computing University of Wisconsin-Madison

Kerberos & HPC Batch systems. Matthieu Hautreux (CEA/DAM/DIF)

Things you may not know about HTCondor. John (TJ) Knoeller Condor Week 2017

VMware AirWatch Content Gateway Guide For Linux

Connecting Restricted, High-Availability, or Low-Latency Resources to a Seamless Global Pool for CMS

PROOF-Condor integration for ATLAS

Condor and BOINC. Distributed and Volunteer Computing. Presented by Adam Bazinet

Grid Authentication and Authorisation Issues. Ákos Frohner at CERN

VMware AirWatch Content Gateway Guide for Linux For Linux

What s new in HTCondor? What s coming? European HTCondor Workshop June 8, 2017

Venafi Platform. Architecture 1 Architecture Basic. Professional Services Venafi. All Rights Reserved.

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.

Configuring the Oracle Network Environment. Copyright 2009, Oracle. All rights reserved.

Xrootd Monitoring for the CMS Experiment

OSGMM and ReSS Matchmaking on OSG

Client tools know everything

What s new in Condor? What s coming? Condor Week 2011

Things you may not know about HTCondor. John (TJ) Knoeller Condor Week 2017

AutoPyFactory: A Scalable Flexible Pilot Factory Implementation

Managing External Identity Sources

How to install Condor-G

VMware AirWatch Content Gateway for Windows. VMware Workspace ONE UEM 1811 Unified Access Gateway

Enabling Distributed Scientific Computing on the Campus

Troubleshooting Grid authentication from the client side

AsteriskNow IPTables Firewall Configuration

ALICE Grid Activities in US

ESET Remote Administrator 6. Version 6.0 Product Details

Red Hat Enterprise Linux 7 Getting Started with Cockpit

Leveraging Globus Identity for the Grid. Suchandra Thapa GlobusWorld, April 22, 2016 Chicago

GROWL Scripts and Web Services

Version Installation Guide. 1 Bocada Installation Guide

Day 9: Introduction to CHTC

CIS 505: Software Systems

New Directions and BNL

Teradici PCoIP Connection Manager 1.8 and Security Gateway 1.14

Experiences with Building, Deploying and Running a remotecontrolled

MSE System and Appliance Hardening Guidelines

Bookkeeping and submission tools prototype. L. Tomassetti on behalf of distributed computing group

EUROPEAN MIDDLEWARE INITIATIVE

NGFW Security Management Center

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

Transcription:

glideinwms Training @ UCSD glideinwms Frontend Installation Part 1 Condor Installation by Igor Sfiligoi (UCSD) UCSD Jan 17th 2012 Condor Install 1

Overview Introduction Planning and Common setup Central Manager Installation Submit node Installation UCSD Jan 17th 2012 Condor Install 2

Refresher - Glideins A glidein is just a properly configured Condor execution node submitted as a Grid job glideinwms provides automation Submit node Central manager Collector Negotiator CREAM Execution glideinnode Execution glideinnode Submit node Submit node Schedd glideinwms Globus Execution glideinnode Execution node glidein Startd Job UCSD Jan 17th 2012 Condor Install 3

Refresher - Glideins The glideinwms triggers glidein submission The regular negotiator matches jobs to glideins Central manager Submit node Collector Negotiator CREAM Execution glideinnode Execution glideinnode Submit node Submit node Schedd glideinwms Globus Execution glideinnode Execution node glidein Startd Job UCSD Jan 17th 2012 Condor Install 4

Bottom line Condor is king! (glideinwms just a small layer on top) UCSD Jan 17th 2012 Condor Install 5

Condor installation Proper Condor installation and configuration the most important task Condor will do most of the work and is thus the most resource hungry GlideinWMS installation almost an afterthought Although it does require proper security config of Condor GlideinWMS installation proper will be described in a separate talk UCSD Jan 17th 2012 Condor Install 6

Planning and Common setup UCSD Jan 17th 2012 Condor Install 7

Refresher - Condor Two main node types Submit node(s) Central manager (execute nodes are dynamic glideins) Public TCP/IP networking needed GSI used for network security Submit node Schedd Central manager Collector Negotiator glidein UCSD Jan 17th 2012 Condor Install 8

Planning the setup In theory, all Condor daemons can be installed on a single node However, if at all possible, put Central Manager on a dedicated node i.e. do not use it as a submit node, too Both for security and stability reasons You may want/need more than one submit node Depends on expected use and available HW You do need at least one, though UCSD Jan 17th 2012 Condor Install 9

Common system considerations Condor is supported on a wide variety of platforms Including Linux (e.g. RHEL5), MacOS and Windows Linux recommended in OSG (and assumed in the rest of talk) GSI security requires Host or service certificate CAs & CRLs Typically delivered via OSG RPMs (but other means acceptable) https://twiki.grid.iu.edu/bin/view/documentation/release3/installcertauth Full Grid Client software recommended (for ease of ops) https://twiki.grid.iu.edu/bin/view/documentation/release3/installosgclient UCSD Jan 17th 2012 Condor Install 10

OSG Grid Client Requires RHEL5-compatible Linux RHEL6 support promised for early 2012 Procedure in a nutshell Add EPEL and OSG RPM repositories to sys conf. yum install osg-ca-certs yum install osg-client Enable CRL fetching crontab Other Grid clients (e.g. EGI/glite) will work just as well https://twiki.grid.iu.edu/bin/view/documentation/release3/installosgclient UCSD Jan 17th 2012 Condor Install 11

Requesting a host certificate OSG provides a script to talk to DOEGrids https://twiki.grid.iu.edu/bin/view/documentation/release3/gethostservicecertificates Procedure in a nutshell Install OSG client yum install osg-cert-scripts cert-request Wait for email cert-retrieve cp into /etc/grid-security/ If you have other ways to obtain a host cert, feel free to use them UCSD Jan 17th 2012 Condor Install 12

Condor Central Manager UCSD Jan 17th 2012 Condor Install 13

Refresher - Central Manager Two (groups of) processes Collector Negotiator Central manager Collector Negotiator The Collector defines the Condor pool Knows about all the glideins it owns Knows about all the schedds The Negotiator does the matchmaking Decides who gets what resources UCSD Jan 17th 2012 Condor Install 14

Condor Collector considerations The Collector is the repository of all knowledge All other daemons report to it Including the glideins, who get its address at run-time Must process lots of info One update every 5 mins from each and every daemon With strong security expensive Central manager Negotiator Collector Collector Collector Typically deployed as a tree of collectors All security handled in leafs glidein glidein Top one still has the complete picture glidein glidein UCSD Jan 17th 2012 Condor Install 15

CCB An additional cost The Condor collectors are also acting as CCBs Each glidein will open 5+ long-lived TCP sockets Make sure you have enough file descriptors Default OS limit is 1024 per process Plan on having one CCB per 100 glideins CCB Leafs in the tree of collectors Call me back I want to connect to the execute node transfer files UCSD Jan 17th 2012 Condor Install 16

High availability (theory) Central manager can be a single point of failure If it dies, the Condor pool dies with it! To avoid this, one can deploy multiple CMs All daemons will advertise to 2 (or more) Collectors Currently not supported by glideinwms All CMs will have the same view of the world There can only be one Negotiator, though One negotiator will be Active, all others in standby More details on Condor man page http://www.cs.wisc.edu/condor/manual/v7.6/3_11high_availability.html#section004112000000000000000 UCSD Jan 17th 2012 Condor Install 17

Hardware needs Tree of collectors spreads the load over multiple processes So several CPUs come handy Negotiator single threaded Will benefit from fast CPU Memory usage not terrible O(100k) per glidein to store ClassAds Exact footprint depends on how many additional attributes the VO defines Concrete CMS example: 25k glideins ~ 6G memory Negligible disk IO UCSD Jan 17th 2012 Condor Install 18

System considerations Does not need to run as root (although it can) Make sure the host cert is readable by that user Must be on the public IP network Each collector listens on its own well defined port, must be reachable by all glideins (WAN) Negotiator has a dynamic list port, must be reachable by submit nodes (schedds) Will use a large number of network sockets Will overwhelm most firewalls Minimize risk due to Condor bugs Consider disabling stateful firewalls (e.g. iptables) Must open firewall at least for these UCSD Jan 17th 2012 Condor Install 19

Security considerations Cannot be firewalled endpoint security GSI security used (i.e. x509 certs) for networking Limit administrative rights to local users (FS auth) The Collector is central trust point of the pool The DNs of all other daemons are whitelisted here, including: Schedds Glideins (i.e. pilot proxies) Clients (e.g. glideinwms Frontend) UCSD Jan 17th 2012 Condor Install 20

Installing the CM Two major burdens (for basic install) Collector tree Security setup The glideinwms installer helps with both Starting from Condor tarball As any user (e.g. as non-root) Highly recommended RPM install also an option Easy to keep up-to-date (i.e. yum update) But you will need to configure by hand And will run as root Easy-to-use update cmdline tool available, too Unless you hack the startup script UCSD Jan 17th 2012 Condor Install 21

Collector tree setup In a nutshell For each secondary collector: Tell Master to start a collector on different port repeat Forward ClassAds to main Collector COLLECTORXXX = $(COLLECTOR) = $(COLLECTOR) COLLECTORXXX_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/CollectorXXXLog" COLLECTORXXX_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/CollectorXXXLog" COLLECTORXXX_ARGS = -f -p YYYY COLLECTORXXX_ARGS = -f -p YYYY DAEMON_LIST = $(DAEMON_LIST) COLLECTORXXX DAEMON_LIST = $(DAEMON_LIST) COLLECTORXXX # forward ads to the main collector # forward ads to the main collector # (this is ignored by the main collector, since the address matches itself) CONDOR_VIEW_HOST # (this is ignored by the = main $(COLLECTOR_HOST) collector, since the address matches itself) CONDOR_VIEW_HOST = $(COLLECTOR_HOST) x N UCSD Jan 17th 2012 Condor Install 22

Security setup (1) In a nutshell Configure basic GSI (i.e. point to CAs and host cert) Set up authorization (i.e. switch to whitelist) Whitelist all DNs Enable GSI DN whitelisting a bit annoying Must be done in two places in condor_config, and in condor_mapfile glideinwms provides a cmdline tool And is a regexp here! UCSD Jan 17th 2012 Condor Install 23

Security setup (2) # condor_config.local # condor_config.local # Configure GSI CERTIFICATE_MAPFILE=/home/condor/glidecondor/certs/condor_mapfile # Configure GSI CERTIFICATE_MAPFILE=/home/condor/glidecondor/certs/condor_mapfile GSI_DAEMON_TRUSTED_CA_DIR=/etc/grid-security/certificates GSI_DAEMON_TRUSTED_CA_DIR=/etc/grid-security/certificates GSI_DAEMON_CERT = /home/condor/.globus/hostcert.pem GSI_DAEMON_CERT = /home/condor/.globus/hostcert.pem GSI_DAEMON_KEY = /home/condor/.globus/hostkey.pem GSI_DAEMON_KEY = /home/condor/.globus/hostkey.pem # Force whitelisting DENY_WRITE # Force whitelisting = anonymous@* DENY_WRITE = anonymous@* DENY_ADMINISTRATOR = anonymous@* DENY_ADMINISTRATOR = anonymous@* DENY_DAEMON = anonymous@* DENY_DAEMON = anonymous@* DENY_NEGOTIATOR = anonymous@* DENY_NEGOTIATOR = anonymous@* DENY_CLIENT = anonymous@* DENY_CLIENT = anonymous@* ALLOW_ADMINISTRATOR = $(CONDOR_HOST) ALLOW_ADMINISTRATOR = $(CONDOR_HOST) ALLOW_WRITE = * ALLOW_WRITE = * USE_VOMS_ATTRIBUTES = False # use only pilot DN, not FQAN USE_VOMS_ATTRIBUTES = False # use only pilot DN, not FQAN # list all DNs # condor_mapfile # list all DNs # condor_mapfile GSI_DAEMON_NAME=$(GSI_DAEMON_NAME),DNXXX GSI "^DNXXX$" UIDXXX x N GSI_DAEMON_NAME=$(GSI_DAEMON_NAME),DNXXX GSI "^DNXXX$" UIDXXX GSI (.*) anonymous GSI (.*) anonymous # enable GSI FS (.*) \1 # enable GSI FS (.*) \1 SEC_DEFAULT_AUTHENTICATION_METHODS = FS,GSI SEC_DEFAULT_AUTHENTICATION_METHODS = FS,GSI SEC_DEFAULT_AUTHENTICATION = REQUIRED SEC_DEFAULT_AUTHENTICATION = REQUIRED SEC_DEFAULT_ENCRYPTION = OPTIONAL SEC_DEFAULT_ENCRYPTION = OPTIONAL Also enable local auth SEC_DEFAULT_INTEGRITY = REQUIRED SEC_DEFAULT_INTEGRITY = REQUIRED # optionally, relax client and read settings # optionally, relax client and read settings UCSD Jan 17th 2012 Condor Install 24

Installing with Q&A installer ~/glideinwms/install$./glideinwms_install ~/glideinwms/install$./glideinwms_install Please select: 4 Please select: 4 [4] User Pool Collector [4] User Pool Collector Where do you have the Condor tarball? /home/condor/downloads/condor-7.6.4-x86_rhap_5-stripped.tar.gz Where do you have the Condor tarball? /home/condor/downloads/condor-7.6.4-x86_rhap_5-stripped.tar.gz Where do you want to install it?: [/home/condor/glidecondor] /home/condor/glidecondor If Where something do you goes want wrong to install with Condor, it?: [/home/condor/glidecondor] who should get email about /home/condor/glidecondor it?: me@myemail If something goes wrong with Condor, who should get email about it?: me@myemail Do you want to split the config files between condor_config and condor_config.local?: (y/n) [y] y Do you want to split the config files between condor_config and condor_config.local?: (y/n) [y] y Do you want to get it from VDT?: (y/n) y Do you want to get it from VDT?: (y/n) y Do you have already a VDT installation?: (y/n) y Do you have already a VDT installation?: (y/n) y Where is the VDT installed?: /etc/osg/wn-client Where is the VDT installed?: /etc/osg/wn-client Will you be using a proxy or a cert? (proxy/cert) cert Will you be using a proxy or a cert? (proxy/cert) cert Where is your certificate located?: /home/condor/.globus/hostcert.pem Where is your certificate located?: /home/condor/.globus/hostcert.pem Where is your certificate key located?: /home/condor/.globus/hostkey.pem Where is your certificate key located?: /home/condor/.globus/hostkey.pem My DN = 'DN1' My DN = 'DN1' DN: DNXXX DN: DNXXX x N nickname: [condor001] uidxxx nickname: [condor001] uidxxx Is this a trusted Condor daemon?: (y/n) y Is this a trusted Condor daemon?: (y/n) y DN: DN: How many slave collectors do you want?: [5] 200 How many slave collectors do you want?: [5] 200 What name would you like to use for this pool?: [My pool] MyVO What name would you like to use for this pool?: [My pool] MyVO What port should the collector be running?: [9618] 9618 What port should the collector be running?: [9618] 9618 You can also add the DNs as an independent step UCSD Jan 17th 2012 Condor Install 25

Maintenance If you need to add more DNs, use cmdline tool glidecondor_adddn ~/glideinwms/install$./glidecondor_adddn -daemon "DN of Schedd A" "DNA" UIDA ~/glideinwms/install$./glidecondor_adddn -daemon "DN of Schedd A" "DNA" UIDA Configuration files changed. Configuration files changed. Remember to reconfig the affected Condor daemons. Remember to reconfig the affected Condor daemons. To upgrade the Condor binaries, use cmdline tool glidecondor_upgrade ~/glideinwms/install$./glidecondor_upgrade ~/Downloads/condor-7.6.5-x86_rhap_5-stripped.tar.gz ~/glideinwms/install$./glidecondor_upgrade ~/Downloads/condor-7.6.5-x86_rhap_5-stripped.tar.gz Will update Condor in /home/condor/glidecondor Will update Condor in /home/condor/glidecondor.... Creating backup dir Creating backup dir Putting new binaries in place Putting new binaries in place Finished successfully Finished successfully Old binaries can be found in /home/condor/glidecondor/old.120102_13 Old binaries can be found in /home/condor/glidecondor/old.120102_13 UCSD Jan 17th 2012 Condor Install 26

Starting Condor The installer will start Condor for you, but you still should know how to stop and start it by hand To start condor, run: ~/glidecondor/start_condor.sh To stop Condor, use condor_off -daemon master Finally, to force Condor to re-read the config: ~/glidecondor/sbin/condor_reconfig UCSD Jan 17th 2012 Condor Install 27

Condor Submit node(s) UCSD Jan 17th 2012 Condor Install 28

Refresher - Submit node(s) Submit node defined by the schedd Which holds user jobs Submit node Shadows will be started as the jobs are matched to glideins One per running job Schedd Shadow.. Shadow At least one submit node is needed But there may be many UCSD Jan 17th 2012 Condor Install 29

Network use Glideins must contact the submit node in order to run jobs Both with standard protocol and CCB Each shadow normally uses 2 random ports Not firewall friendly Can be a problem over O(10k) jobs Newer versions of Condor support shared port daemon Listens on a single port Although firewalls can get overwhelmed anyhow (see CM slides) Does not reduce number of sockets Forwards the sockets to the appropriate local process UCSD Jan 17th 2012 Condor Install 30

Security considerations Like with CM, must use endpoint security Schedd and CM must whitelist each other Certificate DN based AuthZ with glideins indirect No need to whitelist glidein DN(s) Collector trusts glidein, Schedd trusts Collector Schedd also must whitelist any clients (e.g. VO Frontend) Only startds can use indirect AuthZ Submit node Schedd Local users use FS auth (i.e. UID based) Central manager Collector Negotiator glidein http://research.cs.wisc.edu/condor/manual/v7.6/3_3configuration.html#param:secenablematchpasswordauthentication UCSD Jan 17th 2012 Condor Install 31

Hardware needs Submit node is memory hungry 1M per running jobs due to shadows O(10k) per job in queue for ClassAds Schedd can use a fast CPU (single threaded) Shadows very light CPU users Jobs may put substantial IO load on HDD Depends on how much data is being produced Depends how short are the jobs And the above is just for Condor VO may have portal software or actual interactive users Actual need depends on how many additional VO attributes used Make sure the remaining HW is adequate for these UCSD Jan 17th 2012 Condor Install 32

User account considerations Users must be able to launch condor_submit locally on the submit node Remote submission not recommended (and disabled by default) VO must decide how to do it SSHd (i.e. interactive use) Portal (e.g. CMS CRABServer) Will need one UID per user Non-UID based auth possible, but not recommended (but not supported out of the box) Still local from the Condor point of view No need to create user accounts before Installing Condor, but do plan for it UCSD Jan 17th 2012 Condor Install 33

Schedd is a superuser Schedd must run as root (euid==0, even as it drops ruid to condor ) So it can switch UID as needed To access user files Same for shadows (but ruid set to job user) Host cert thus must be owned by root UCSD Jan 17th 2012 Condor Install 34

Installing the submit node Two major burdens (for basic install) Shared port daemon Security setup The glideinwms installer helps with both Starting from Condor tarball Should be run as root Highly recommended RPM install also an option Easy-to-use update cmdline tool available, too Easy to keep up-to-date (i.e. yum update) But you will need to configure by hand UCSD Jan 17th 2012 Condor Install 35

Shared port daemon Not enabled by default in Condor In a nutshell Pick a port for it Enable it Add it to the list of Daemons to start # # condor_config.local condor_config.local # # Enable Enable shared_port_daemon shared_port_daemon SHARED_PORT_ARGS SHARED_PORT_ARGS = = -p -p 9615 9615 USE_SHARED_PORT USE_SHARED_PORT = = True True DAEMON_LIST DAEMON_LIST = = $(DAEMON_LIST) $(DAEMON_LIST) SHARED_PORT SHARED_PORT UCSD Jan 17th 2012 Condor Install 36

Security setup (1) In a nutshell Configure basic GSI (i.e. point to CAs and host cert) Enable match authentication Set up authorization (i.e. switch to whitelist) Whitelist all DNs Enable GSI DN whitelisting a bit annoying Must be done in two places in condor_config, and And is a regexp here! in condor_mapfile glideinwms provides a cmdline tool UCSD Jan 17th 2012 Condor Install 37

Security setup (2) # condor_config.local # condor_config.local # Configure GSI CERTIFICATE_MAPFILE=/opt/glidecondor/certs/condor_mapfile # Configure GSI CERTIFICATE_MAPFILE=/opt/glidecondor/certs/condor_mapfile GSI_DAEMON_TRUSTED_CA_DIR=/etc/grid-security/certificates GSI_DAEMON_TRUSTED_CA_DIR=/etc/grid-security/certificates GSI_DAEMON_CERT = /etc/grid-security/hostcert.pem GSI_DAEMON_CERT = /etc/grid-security/hostcert.pem GSI_DAEMON_KEY = /etc/grid-security/hostkey.pem GSI_DAEMON_KEY = /etc/grid-security/hostkey.pem # Enable match authentication # Enable match authentication SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION = TRUE SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION = TRUE # Force whitelisting DENY_WRITE # Force whitelisting = anonymous@* DENY_WRITE = anonymous@* # see CM slides for details # see CM slides for details # list all DNs # list all DNs GSI_DAEMON_NAME=$(GSI_DAEMON_NAME),DNXXX GSI_DAEMON_NAME=$(GSI_DAEMON_NAME),DNXXX # enable GSI SEC_DEFAULT_AUTHENTICATION_METHODS # enable GSI = FS,GSI SEC_DEFAULT_AUTHENTICATION_METHODS = FS,GSI SEC_DEFAULT_AUTHENTICATION = REQUIRED SEC_DEFAULT_AUTHENTICATION = REQUIRED SEC_DEFAULT_ENCRYPTION = OPTIONAL SEC_DEFAULT_ENCRYPTION = OPTIONAL SEC_DEFAULT_INTEGRITY = REQUIRED SEC_DEFAULT_INTEGRITY = REQUIRED # optionally, relax client and read settings # optionally, relax client and read settings # condor_mapfile # condor_mapfile GSI "^DNXXX$" UIDXXX GSI "^DNXXX$" UIDXXX GSI (.*) anonymous GSI (.*) anonymous FS (.*) \1 FS (.*) \1 Also enable local auth x N UCSD Jan 17th 2012 Condor Install 38

Network optimization settings Since glideins often behind firewalls The glidein Startd setup optimized to avoid incoming connections and UDP The Schedd must also play along # # condor_config.local condor_config.local # # Reverse Reverse protocol protocol direction direction STARTD_SENDS_ALIVES STARTD_SENDS_ALIVES = = True True # # Avoid Avoid UDP UDP SCHEDD_SEND_VACATE_VIA_TCP SCHEDD_SEND_VACATE_VIA_TCP = = True True UCSD Jan 17th 2012 Condor Install 39

Installing with Q&A installer ~/glideinwms/install$./glideinwms_install ~/glideinwms/install$./glideinwms_install Please select: 5 [5] Please User Schedd select: 5 [5] User Schedd Which user should Condor run under?: [condor] condor Where Which do user you should have the Condor Condor run tarball? under?: /root/condor-7.6.4-x86_rhap_5-stripped.tar.gz [condor] condor Where do you have the Condor tarball? /root/condor-7.6.4-x86_rhap_5-stripped.tar.gz Where do you want to install it?: [/home/condor/glidecondor] /opt/glidecondor Where do you want to install it?: [/home/condor/glidecondor] /opt/glidecondor If something goes wrong with Condor, who should get email about it?: me@myemail If something goes wrong with Condor, who should get email about it?: me@myemail Do you want to split the config files between condor_config and condor_config.local?: (y/n) [y] y Do you want to split the config files between condor_config and condor_config.local?: (y/n) [y] y Do you want to get it from VDT?: (y/n) y Do you want to get it from VDT?: (y/n) y Do you have already a VDT installation?: (y/n) y Do you have already a VDT installation?: (y/n) y Where is the VDT installed?: /etc/osg/wn-client Where is the VDT installed?: /etc/osg/wn-client Will you be using a proxy or a cert? (proxy/cert) cert Will you be using a proxy or a cert? (proxy/cert) cert Where is your certificate located?: /etc/grid-security/hostcert.pem Where is your certificate located?: /etc/grid-security/hostcert.pem Where is your certificate key located?: /etc/grid-security/hostkey.pem Where is your certificate key located?: /etc/grid-security/hostkey.pem My DN = 'DN1' My DN = 'DN1' DN: DNXXX DN: DNXXX nickname: [condor001] uidxxx nickname: [condor001] uidxxx Is this a trusted Condor daemon?: (y/n) y Is this a trusted Condor daemon?: (y/n) y DN: DN: What node is the collector running (i.e. CONDOR_HOST)?: collectornode.mydomain What node is the collector running (i.e. CONDOR_HOST)?: collectornode.mydomain Do you want to enable the shared_port_daemon?: (y/n) y Do you want to enable the shared_port_daemon?: (y/n) y What port should it use?: [9615] 9615 What port should it use?: [9615] 9615 How many secondary schedds do you want?: [9] 0 How many secondary schedds do you want?: [9] 0 x N You can also add the DNs as an independent step UCSD Jan 17th 2012 Condor Install 40

Maintenance If you need to add more DNs, use cmdline tool glidecondor_adddn ~/glideinwms/install$./glidecondor_adddn -daemon "DN of Schedd A" "DNA" UIDA ~/glideinwms/install$./glidecondor_adddn -daemon "DN of Schedd A" "DNA" UIDA Configuration files changed. Configuration files changed. Remember to reconfig the affected Condor daemons. Remember to reconfig the affected Condor daemons. To upgrade the Condor binaries, use cmdline tool glidecondor_upgrade ~/glideinwms/install$./glidecondor_upgrade ~/Downloads/condor-7.6.5-x86_rhap_5-stripped.tar.gz ~/glideinwms/install$./glidecondor_upgrade ~/Downloads/condor-7.6.5-x86_rhap_5-stripped.tar.gz Will update Condor in /home/condor/glidecondor Will update Condor in /home/condor/glidecondor.... Creating backup dir Creating backup dir Putting new binaries in place Putting new binaries in place Finished successfully Finished successfully Old binaries can be found in /home/condor/glidecondor/old.120102_13 Old binaries can be found in /home/condor/glidecondor/old.120102_13 Do not use -daemon for client's DN UCSD Jan 17th 2012 Condor Install 41

Starting Condor The installer will start Condor for you, but you still should know how to stop and start it by hand The installer has created an init.d script for you /etc/init.d/condor start stop To force Condor to reload its config, still use /opt/glidecondor/sbin/condor_reconfig All as root UCSD Jan 17th 2012 Condor Install 42

Fine tunning UCSD Jan 17th 2012 Condor Install 43

Fine tunning The previous slides provide only basic setup Although the glideinwms does some basic tunning You will likely want to tune the system further Proper limits in the submit node Default job attributes Sanity checks Priority tunning Not part of this talk Will go into details tomorrow UCSD Jan 17th 2012 Condor Install 44

Integration with OSG Accounting UCSD Jan 17th 2012 Condor Install 45

OSG Accounting OSG tries to keep accurate accounting information of who used what resources Using GRATIA https://twiki.grid.iu.edu/twiki/bin/view/accounting/webhome http://gratia-osg-prod-reports.opensciencegrid.org/gratia-reporting/ UCSD Jan 17th 2012 Condor Install 46

Per-user accounting OSG has per-user accounting, too With glideins, this level of detail lost Only pilot proxy seen by OSG (sites) UCSD Jan 17th 2012 Condor Install 47

The glidein GRATIA probe OSG thus asks glidein operators to install a dedicated probe alongside the glidein schedd(s) Which will provide per-user accounting info to the OSG GRATIA server Optimized for use with OSG glidein factory https://twiki.grid.iu.edu/bin/view/accounting/probeconfigglideinwms Submit node Schedd GRATIA Probe OSG GRATIA Server UCSD Jan 17th 2012 Condor Install 48

Installing the GRATIA probe In a nutshell Register submit node with GOC Tweak condor config yum install gratia-probe-condor Configure GRATIA https://twiki.grid.iu.edu/bin/view/accounting/probeconfigglideinwms UCSD Jan 17th 2012 Condor Install 49

Condor changes for GRATIA GRATIA gets information from history logs Requires one file per terminated job for efficiency GRATIA needs to know where the job ran Additional attribute added to the job ClassAd (more general details on this tomorrow) # # condor_config.local condor_config.local PER_JOB_HISTORY_DIR PER_JOB_HISTORY_DIR = = /var/lib/gratia/data /var/lib/gratia/data JOBGLIDEIN_ResourceName=\ JOBGLIDEIN_ResourceName=\ "$$([IfThenElse(IsUndefined(TARGET.GLIDEIN_ResourceName), "$$([IfThenElse(IsUndefined(TARGET.GLIDEIN_ResourceName), \ \ IfThenElse(IsUndefined(TARGET.GLIDEIN_Site), IfThenElse(IsUndefined(TARGET.GLIDEIN_Site), \ \ FileSystemDomain, FileSystemDomain, TARGET.GLIDEIN_Site), TARGET.GLIDEIN_Site), \ \ TARGET.GLIDEIN_ResourceName)])" TARGET.GLIDEIN_ResourceName)])" SUBMIT_EXPRS SUBMIT_EXPRS = = $(SUBMIT_EXPRS) $(SUBMIT_EXPRS) JOBGLIDEIN_ResourceName JOBGLIDEIN_ResourceName UCSD Jan 17th 2012 Condor Install 50

GRATIA configuration Essentially just tell GRATIA what name you have registered in with GOC Then enable it You also need to tell it where to find Condor # # /etc/gratia/condor/probeconfig /etc/gratia/condor/probeconfig SiteName="VOX_glidein_node1" SiteName="VOX_glidein_node1" EnableProbe="1" EnableProbe="1" # # add add this this line line to to allow allow user user jobs jobs # # without without a a proxy proxy MapUnknownToGroup="1" MapUnknownToGroup="1" # # /root/setup.sh /root/setup.sh source source /etc/profile.d/condor.sh /etc/profile.d/condor.sh UCSD Jan 17th 2012 Condor Install 51

The End UCSD Jan 17th 2012 Condor Install 52

Pointers The official glideinwms project Web page is http://tinyurl.com/glideinwms glideinwms development team is reachable at glideinwms-support@fnal.gov Condor Home Page http://www.cs.wisc.edu/condor/ Condor support condor-user@cs.wisc.edu condor-admin@cs.wisc.edu UCSD Jan 17th 2012 Condor Install 53

Acknowledgments The glideinwms is a CMS-led project developed mostly at FNAL, with contributions from UCSD and ISI The glideinwms factory operations at UCSD is sponsored by OSG The funding comes from NSF, DOE and the UC system UCSD Jan 17th 2012 Condor Install 54