Co-ordination & Harmonisation of Advanced e-infrastructures for Research and Education Data Sharing Research Infrastructures Grant Agreement n. 306819 CREAM Computing Element Overview Bruce Becker, Coordinator SAGrid (Meraka Institute) Africa Arabia ROC (South) All Hands meeting
Outline CREAM-CE Outline Functionality Integration with the batch system Sandbox management Security, Authn/Authz Information provider Accounting system integration Where to get CREAM-CE Installation guide Configuration guide Some customisations 2
CREAM CE at a glance A lightweight web-service interface to the LRMS Extension of the Java-Axis servlet running inside the Apache Tomcat container Modular design : CREAM service (tomcat container) BLAH services (Bupdater, Bnotifier) BLAH (Batch Local Ascii Helper) provides integration between queue/scheduer and usage information, and CREAM Resource BDII Torque services (main service usually TORQUE, MAUI, munge) Other services (Mysql, LB locallogger, Globus GridFTP) 3
CREAM CE functionality Job Management Direct to CREAM CE or via WMS Using generic CLI client or C++/Java/Python API Possibility of direct staging of input sandbox files glite JDL compliance (with CREAM-specific extensions) Support for batch and parallel jobs (automatic forwarding of job requirements to the batch system) Manual and automatic proxy delegation Job Cancellation, Info, List, Suspend/Resume Output retrieval Possibility (for admin) to disable new submissions Self limiting CREAM behaviour Integration with APEL and DGAS accounting systems 4
CREAM CE Architecture 5
Integration with the batch system Integration with the batch system is done with BLAH responsible for : interaction with the batch system for job management operations (submit, cancel, etc) Notification of CREAM about job status changes via the BLParser component about state changes Supported batch sytems : TORQUE Sun Grid Engine LSF (Platform Computing) 6
Sandbox Management Input Sandbox (ISB): files needed for the job to execute properly Client data push and/or server data pull supported Client data push: files staged from the client (UI) node Server data pull: files retrieved from remote server Output Sandbox (OSB): files produced by the job Server data push and/or client data pull supported Server data push: files uploaded to remote server Client data pull: files retrieved on the client (UI) node by the end user 7
Security Authentication implemented via Trustmanager Authorisation implemented by gja (grid Java Authorisation Framework) or ARGUS CREAM CE can have remote administrators: An admin can manage jobs submitted by other users and perform some priveliged operations (enable/disable submissions) DN of the CREAM admin must be listed in the admin list : /etc/grid-security/admin-list 8
Authorisation For CREAM-CE service : grid/group mapfiles are read to check if the submitting user is authorised or not glexec is used to check which local user is mapped to the grid user All operations done on behalf of this user are done using sudo with localid returned by glexec For gridftp: Authorisation is managed by LCAS (Local Centre Authorisation Service) Grid/group mapfiles are read to check if the submitting user is authorised or not LCMAPS is used to check which local user is mapped to that grid user To use gjaf, set YAIM variable: USE_ARGUS=no 9
Authorisation with ARGUS ARGUS central authorisation service All authorisation policies are defined on the ARGUS box Easy way to maintain control over credentials in a distributed infrastructure Usually 1 ARGUS box per grid. CREAM contacts ARGUS to check if the user is authorised In the case of success, ARGUS returns the local mapped user All operations are done on behalf of the user using sudo To enable ARGUS on the sute, use USE_ARGUS=yes Other relevant YAIM variables : ARGUS_PEPD_ENDPOINTS endpoint of the ARGUS box (e.g. https://argus.sagrid.ac.za:8154/authz) CREAM_PEPC_RESOURCE_ID the ID of the CREAM CE in the ARGUS box 10
CREAM GRIS CREAM CE runs a resource BDII publishing information about the CE Both GLUE1 and GLUE2 schema are supported Some information is dynamic and produced by plugins/providers GIP generic information provider see scripts in /var/lib/bdii/gip/plugin and /var/lib/bdii/gip/provider: created by YAIM at configuration time. To query the resource bdii : ldapsearch -h <cream-ce host> -p 2170 -b <basedn> e.g.: ldapsearch -h grid-ce.chpc.ac.za -p 2170 -b mds-vo-name=resource,o=grid (GLUE 1) ldapsearch -h grid-ce.chpc.ac.za -p 2170 -b o=glue (GLUE2) 11
Accounting system integration CREAM CE supports natively both DGAS and APEL accounting systems Integration is implemented by files (filled by BLAH) which are parsed by APEL/DGAS sensors Accounting log files are stored in /var/log/cream/accounting 12
Example accounting output [.] "timestamp=2013-03-24 09:58:24" "userdn=/c=it/o=infn/ou=personal Certificate/L=ZA-ITHEMBALABS/CN=Sean Hamilton Thomas Murray" "userfqan=/alice/role=lcgadmin/capability=null" "userfqan=/alice/role=null/capability=null" "userfqan=/alice/lcg1/role=null/capability=null" "ceid=grid-ce.chpc.ac.za:8443/cream-pbs-alice" "jobid=cream414073843" "lrmsid=976285.grid-ce.chpc.ac.za" "localuser=60142" "clientid=cream_414073843" "timestamp=2013-03-24 09:58:28" "userdn=/c=it/o=infn/ou=personal Certificate/L=ZA-ITHEMBALABS/CN=Sean Hamilton Thomas Murray" "userfqan=/alice/role=lcgadmin/capability=null" "userfqan=/alice/role=null/capability=null" "userfqan=/alice/lcg1/role=null/capability=null" "ceid=grid-ce.chpc.ac.za:8443/cream-pbs-alice" "jobid=cream744562483" "lrmsid=976286.grid-ce.chpc.ac.za" "localuser=60142" "clientid=cream_744562483" 13
Where to get CREAM CREAM is distributed with EMI - select the relevant repos : SL5/SL6 (x86_64) OS Repos rpm -Uvh http://scientificlinux.mirror.ac.za/5x/x86_64/sl/sl-release-5.9-2.sl.x86_64.rpm rpm -Uvh http://scientificlinux.mirror.ac.za/6x/x86_64/os/packages/sl-release-6.3-1.x86_ 64.rpm EPEL rpm -Uvh http://fedora.mirror.ac.za/epel/5/x86_64/epel-release-5-4.noarch.rpm rpm -Uvh http://scientificlinux.mirror.ac.za/6x/x86_64/os/packages/epel-release-6-5.noar ch.rpm CA repo rpm -Uvh http://repository.egi.eu/sw/production/cas/1/current/rpms.production/ca-policy-egi-core-1. 52-1.noarch.rpm EMI Repos Rpm -Uvh http://emisoft.web.cern.ch/emisoft/dist/emi/2/sl5/x86_64/base/emi-release-2.0.0-1.sl5.noa rch.rpm DISABLE DAG 14
Installation prerequisites Host certificate is required in /etc/grid-security/host[cert,key].pem Check: selinux is DISABLED cron and logrotate are installed ntp is installed, configured and running yum install -y ntp chkconfig level 2345 ntpd on ntpdate tick.meraka.csir.co.za /etc/init.d/ntpd restart 15
Install metapackages Install required metapackages: yum install yum-protectbase ca-policy-egi-core xml-commons-apis emi-cream-ce nfs-utils Depending on which LRMS you have, install the CREAM module: SGE: yum install emi-sge-utils LSF: yum install emi-lsf-utils TORQUE: If CE is not TORQUE server, or if you have your own version of TORQUE: yum install emi-torque-utils If CE is also TORQUE server: yum install emi-torque-utils emi-torque-server 16
Munge configuration Since EMI-1, munge is used for inter-node authentication by TORQUE https://code.google.com/p/munge/ Verify that munge has been correctly installed (from EPEL repo) #rpm -q munge munge-libs munge-0.5.8-8.el5 munge-libs-0.5.8-8.el5 One one host (e.g. batch server), generate munge key: /usr/sbin/create-munge-key (gets created in /etc/munge permissions 400) Copythe key to every host in tnhe clusters, djusting permissions #chown munge:munge /etc/munge/munge/key Start munge daemon on each WN /etc/init.d/munge start chkconfig munge on 17
CREAM CE Configuration Use the examples distributed by YAIM as starting points for configuration files: mkdir -p /opt/glite/yaim/etc/<site-name> cp -r /opt/glite/yaim/examples/* /opt/glite/yaim/etc/<site-name> Files to edit: /opt/glite/yaim/etc/chpc/ -- edgusers.conf -- groups.conf -- site-info.def -- users.conf -- wn-list.conf -- services -- glite-creamce 18
Edit the site-info.def file For reference: https://twiki.cern.ch/twiki/bin/view/lcg/site-info_configuration_variables#cream_ce General configuration variables: BDII_HOST WNLIST GROUPS_CONF USERS_CONF MSQL_PASSWORD Site configuration variables SITE_NAME SITE_EMAIL SITE_LAT SITE_LONG Batch-server configuration variables BATCH_LOG_DIR (for Torque/PBS, this must be set to the directory "containing" the server_logs directory; usually /var/torque) BATCH_SERVER JOB_MANAGER BATCH_VERSION WN_LIST 19
Edit the CE specific variables LRMS-specific variables MUNGE_KEY_FILE=/etc/munge/munge.key CONFIG_MAUI=yes CREAM-CE variables in services/glite-creamce CEMON_HOST=$CE_HOST CREAM_DB_USER=cream CREAM_DB_PASWORD=SecurePassword BLPARSER_HOST=$CE_HOST BLPARSER_WITH_UPDATER_NOTIFIER=true Full list at https://twiki.cern.ch/twiki/bin/view/lcg/site-info_co nfiguration_variables#torque_server 20
Configure the CREAM CE service If CREAM CE is the same host as TORQUE, configure the profiles creamce TORQUE_server TORQUE_utils Else only creamce TORQUE_utils Verify the that the site-info.def file is vaild: /opt/glite/yaim/bin/yaim -v -s /opt/glite/yaim/etc/<site-name>/site-info.def -n creamce -n TORQUE_server -n TORQUE_utils Configure the services /opt/glite/yaim/bin/yaim -s site-info.def -n creamce -n TORQUE_server -n TORQUE_utils 21
Checks to be done Verifying the configuration status Check via browser: https://<ce-host>:8443/ce-cream/services/listservices Check log file to see if CREAM service started succesfully: grep CREAM started! connection with BLParser /var/log/cream/glite-cream-ce.log Test gridftp (with proxy): uberftp grid-ce.chpc.ac.za 220 grid-ce.chpc.ac.za GridFTP Server 6.14 (gcc64, 1342551528-83) [Globus Toolkit 5.2.1] ready. 230 User sagrid014 logged in. UberFTP> Try service-info: glite-ce-service-info grid-ce.chpc.ac.za Interface Version = [2.1] Service Version = [1.13] EMI3 Tutorial Description = [CREAM 2] Started at = [Tue Feb 26 17:20:55 2013] Submission enabled = [YES] Status Service Property = [RUNNING] Check submissinon enabled: = [cemon_url]->[na] glite-ce-allowed-submission grid-ce.chpc.ac.za Job Submission to this CREAM CE is enabled 22
Check job submission from UI Cat test.jdl [ ] Executable= /bin/hostname ; Arguments= -f ; StdOutput="output"; StdError ="output"; OutputSandbox="output"; OutputSandboxBaseDestURI="gsiftp://cream-ce.core.wits.ac.za/tmp"; glite-ce-job-submit -a -r cream-ce.core.wits.ac.za:8443/cream-pbs-sagrid test_cream.jdl https://cream-ce.core.wits.ac.za:8443/cream492441750 -bash-3.2$ glite-ce-job-status https://cream-ce.core.wits.ac.za:8443/cream492441750 ****** JobID=[https://cream-ce.core.wits.ac.za:8443/CREAM492441750] Status = [REALLY-RUNNING] -bash-3.2$ glite-ce-job-status https://cream-ce.core.wits.ac.za:8443/cream492441750 ****** JobID=[https://cream-ce.core.wits.ac.za:8443/CREAM492441750] Status = [DONE-OK] ExitCode = [0] 23
Which log files to look at? CREAM log file: /var/log/cream/glite-ce-cream.log /var/log/cream/glite-ce-bupdater.log /var/log/cream/glite-ce-bnotifier.log Tomcat log file: /usr/share/tomcat5/logs/trustmanager.log /usr/share/tomcat5/logs/catalina.out Gridftp log file: /var/log/globus-gridftp.log /var/log/gridftp-session.log 24
Processes in execution Check that the following processes are all running : Tomcat Multiple blahpd Bupdater, bnotifier (for new BLAH parser) Logd,interlogd (for LB localogger) Slapd (GRIS) Gridftp mysql 25
CREAM CE customisation 1: local disks for Sandboxes By default, sandboxes are created in the mapped user's home directory e.g. /home/sagrid014/home_creamxxx Usually NFS-mounted For performance reasons at large sites, you want to be able to use the local disks instead : NFS/autofs mounts can present a large network overhead or bottlneck Specify some directory on the local disk which will be used for sandboxes: e.g. /scratch This customisation has to be done in the CREAM jobwrapper template - See http://grid.pd.infn.it/cream/field.php?n=main.howtocustomizethecreamjobwrapper /etc/glite-ce-cream/jobwrapper.tpl /var/lib/tomcat5/webapps/ce-cream/web-inf/jobwrapper.tpl in EMI-1 26
References CREAM-CE System Administrator's guide https://wiki.italiangrid.it/twiki/bin/view/cream/systemadministratorguideforemi2 CREAM-CE Known Issues https://wiki.italiangrid.it/twiki/bin/view/cream/knownissues CREAM User Guide https://wiki.italiangrid.it/twiki/bin/view/cream/userguideemi2 CREAM Troublshotting guide https://wiki.italiangrid.it/twiki/bin/view/cream/troubleshootingguide Service Reference card, including network requirements https://wiki.italiangrid.it/twiki/bin/view/cream/servicereferencecardemi2 EMI-2 product pages http://www.eu-emi.eu/emi-2-matterhorn-products/-/asset_publisher/b4rk/content/cream-2 (CREAM) http://www.eu-emi.eu/emi-2-matterhorn-products/-/asset_publisher/b4rk/content/cream-torq ue-module-1 (CREAM Torque module) Special thanks to Giuseppe Larocca for slides from EMI training in Taipei giuseppe.larocca@ct.infn.it 27