Kerberos & HPC Batch systems. Matthieu Hautreux (CEA/DAM/DIF)

Similar documents
Using the MyProxy Online Credential Repository

MIT Kerberos & Red Hat

Network Security: Kerberos. Tuomas Aura

User Authentication Principles and Methods

SAS Grid Manager and Kerberos Authentication

Radius, LDAP, Radius, Kerberos used in Authenticating Users

Nicolas Williams Staff Engineer Sun Microsystems, Inc.

Identity Management In Red Hat Enterprise Linux. Dave Sirrine Solutions Architect

Kerberos and Active Directory symmetric cryptography in practice COSC412

Overview of Kerberos(I)

Henry B. Hotz Jet Propulsion Laboratory California Institute of Technology. Kerberos 5 Upgrade

Hardware Tokens in META Centre

CPSC 467b: Cryptography and Computer Security

Kerberos and NFS4 on Linux. isginf Workshop

Using Two-Factor Authentication to Connect to a Kerberos-enabled Informatica Domain

Kerberos V5. Raj Jain. Washington University in St. Louis

SDC EMEA 2019 Tel Aviv

DoD Common Access Card Authentication. Feature Description

SSH with Globus Auth

Factotum Sep. 24, 2007

Kerberos. Pehr Söderman Natsak08/DD2495 CSC KTH 2008

CIS 6930/4930 Computer and Network Security. Topic 7. Trusted Intermediaries

How to Integrate an External Authentication Server

SSSD. Client side identity management. LinuxDays 2012 Jakub Hrozek

Guidelines on non-browser access

Trusted Intermediaries

AIT 682: Network and Systems Security

SSSD: FROM AN LDAP CLIENT TO SYSTEM SECURITY SERVICES DEAMON

Cisco Transport Manager Release 9.2 Basic External Authentication

Likewise Open provides smooth integration with Active Directory environments. We show you how to install

Kerberized Certificate Issuance Protocol (KX509)

Radius, LDAP, Radius used in Authenticating Users

CSCE 813 Internet Security Kerberos

GLOBAL CATALOG SERVICE IMPLEMENTATION IN FREEIPA. Alexander Bokovoy Red Hat Inc. May 4th, 2017

Introduction to High-Performance Computing (HPC)

NFS on Steroids: Building Worldwide Distributed File System Gregory Touretsky Intel IT

Active Directory Attacks and Detection

User Manual. Admin Report Kit for IIS 7 (ARKIIS)

Introduction. Trusted Intermediaries. CSC/ECE 574 Computer and Network Security. Outline. CSC/ECE 574 Computer and Network Security.

CEA Site Report. SLURM User Group Meeting 2012 Matthieu Hautreux 26 septembre 2012 CEA 10 AVRIL 2012 PAGE 1

Linux Administration

Kerberos and Public-Key Infrastructure. Key Points. Trust model. Goal of Kerberos

Understanding the Local KDC

Cryptography and Network Security

Key distribution and certification

13/10/2013. Kerberos. Key distribution and certification. The Kerberos protocol was developed at MIT in the 1980.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Fall Quiz II

High Scalability Resource Management with SLURM Supercomputing 2008 November 2008

Privileged Identity App Launcher and Session Recording

Security and Privacy in Computer Systems. Lecture 7 The Kerberos authentication system. Security policy, security models, trust Access control models

Credential Management in the Grid Security Infrastructure. GlobusWorld Security Workshop January 16, 2003

"Charting the Course... Enterprise Linux Security Administration Course Summary

Remote power and console management in large datacenters

GSS Context Management for NFS: The NFS PAG. William A. (Andy) Adamson Connectathon 2014

FreeIPA. Directory and authentication services the easy way. Christian Stankowic. Free and Open Source software Conference

GSI Online Credential Retrieval Requirements. Jim Basney

Storage Area Networks: Performance and Security

Grid Authentication and Authorisation Issues. Ákos Frohner at CERN

Novell Kerberos Login Method for NMASTM

glideinwms architecture by Igor Sfiligoi, Jeff Dost (UCSD)

2 SCANNING, PROBING, AND MAPPING VULNERABILITIES

FreeIPA Cross Forest Trusts

Acknowledgments. CSE565: Computer Security Lectures 16 & 17 Authentication & Applications

Cloudera s Enterprise Data Hub on the Amazon Web Services Cloud: Quick Start Reference Deployment October 2014

Lustre Client GSS with Linux Keyrings

XSEDE Canonical Use Case 4 Interactive Login

review of the potential methods

Proposed User Experience for Handling Multiple Identity Providers in Network Identity Manager 2.0

Subversion Plugin HTTPS Kerberos authentication

BeoLink.org. Design and build an inexpensive DFS. Fabrizio Manfredi Furuholmen. FrOSCon August 2008

Cisco Unified Provisioning Manager 2.2

Open Source in the Corporate World. Open Source. Single Sign On. Erin Mulder

UNICORE Globus: Interoperability of Grid Infrastructures

Distributed File Storage in Multi-Tenant Clouds using CephFS

--enable-gss Strong authentication in Lustre&friends

The Kerberos Authentication Service

Kerberos User Guide. Release 1.13 MIT

Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments

A User-level Secure Grid File System

Cross-realm trusts with FreeIPA v3

CA Single Sign-On. Performance Test Report R12

Managing External Identity Sources

Single Sign-On Extensions Library THE BEST RUN. PUBLIC SAP Single Sign-On 3.0 SP02 Document Version:

BeeGFS. Parallel Cluster File System. Container Workshop ISC July Marco Merkel VP ww Sales, Consulting

Deploying the TeraGrid PKI

How to Configure Authentication and Access Control (AAA)

2017 TAG Cyber Security Invitational Course Advanced Cyber Security Technology for Practitioners

Authentication in Cloud Application: Claims-Based Identity Model

SAML-Based SSO Solution

CHAPTER 3. ENHANCED KERBEROS SECURITY: An application of the proposed system

Installing and Configuring VMware Identity Manager Connector (Windows) OCT 2018 VMware Identity Manager VMware Identity Manager 3.

SuperMike-II Launch Workshop. System Overview and Allocations

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Delivers cost savings, high definition display, and supercharged sharing

ticrypt DEPLOYMENT OVERVIEW AND TIMELINE Information about hardware, deployment, and on-boarding

Updates from MIT Kerberos

NFSv4 Multi-Domain Access. Andy Adamson Connectathon 2010

A Roadmap for Integration of Grid Security with One-Time Passwords

Kerberos Introduction. Jim Binkley-

NetIQ Privileged Account Manager 3.5 includes new features, improves usability and resolves several previous issues.

Transcription:

Kerberos & HPC Batch systems Matthieu Hautreux (CEA/DAM/DIF) matthieu.hautreux@cea.fr

Outline Kerberos authentication HPC site environment Kerberos & HPC systems AUKS From HPC site to HPC Grid environment M. Hautreux Kerberos Workshop 2010 2

Kerberos authentication Key concepts Trusted third party Commonly made of 1 server and its backup (KDC) Single Sign-On Based on Forwardable/Forwarded TGT Limited credentials lifetime With renewal mechanism Footprint Numerous Supported OS Linux-Based systems, OS X, Microsoft,... Numerous Supported Services OpenSSH, LDAP,... Numerous Supported Distributed File System OpenAFS, NFS, NFSv4,... Mostly in private network M. Hautreux Kerberos Workshop 2010 3

Kerberos authentication OpenSSH, common usage of kerberos Simplify cascading connections authentication (SSO) Provides connection trees from users to their resources Limited validity through expiration time Each connection associated to a validity countdown, the forwarded TGT lifetime OpenSSH, enhanced usage of kerberos Based on cascading credentials refresh Provided by Simon Wilkinson GSSAPI Key-exchange patch Integrated in GSI-SSH (since 4.7) Ease refresh of the connections tree Each connection now associated to the validity countdown of the initial client Initial client credential renew is the single spark to refresh the tree M. Hautreux Kerberos Workshop 2010 4

HPC environment HPC key concepts Distributed systems Centralized to be remotely used by numerous users Large systems Thousands of compute nodes/cores Heavy loaded systems From short and small to large and long computations With numerous running and pending jobs With non negligeable delays between jobs submission and start time Common HPC components Batch systems and Parallel launchers Schedule jobs, grant resources access and launch computations Slurm, Torque, openssh,... Distributed File Systems Share data efficiently between multiple resources Lustre, GPFS,... M. Hautreux Kerberos Workshop 2010 5

HPC environment Common usage Login Nodes connection Using openssh/gsi-ssh Data staging NAS <-> Cluster FS / Local FS transfers Data processing Application development Results preprocessing/postprocessing Interactive jobs execution With a batch system and a parallel launcher May perform data staging too For application development and validation For pre/post-processing Batch jobs submission With a batch system and a parallel launcher May perform data staging too For non-interactive production computation M. Hautreux Kerberos Workshop 2010 6

Kerberos & HPC systems Kerberos interests in HPC Ease user access to compute services Workstation to login nodes connections Ease compute nodes access Login nodes to compute nodes connections For monitoring, debugging,... Secure data staging stages Access data on secured NAS seamlessly For both interactive and batch mode Secure remote connections Contact external servers securely For both interactive and batch mode Secured distributed services access Inside/Outside the clusters Services access tracability M. Hautreux Kerberos Workshop 2010 7

Kerberos & HPC systems Kerberos concerns in HPC Credential Lifetime management What is a common session time in a HPC environment How to get benefit from kerberos integrated renew mechanism Batch mode No interactive input from user involved From where to get a valid credential? Scalability Trusted third party behavior with thousands of active nodes Credential forwarding strategies with thousands of peers HPC specific tools Are they providing kerberos support? M. Hautreux Kerberos Workshop 2010 8

AUKS - Description Goal Provides Kerberos credentials in non interactive environment Batch systems, cron,... Description Kerberos distributed credential delegation system Kerberized client/server application External tool Can be integrated in different projects Linux tool Developed and tested on CentOS, RedHat, Fedora Opensource http://sourceforge.net/projects/auks/ M. Hautreux Kerberos Workshop 2010 9

AUKS - Overview Internals Multi-threaded C application Based on MIT kerberos implementation only (>1.3) Components Central Daemon (auksd) Kerberized server Authorizes requests using client principal and local ACLs Serves add/get/remove/dump TGT requests Stores user TGTs in a FS directory (for persistency) Client API (libauksapi) Kerberized client Provides functions to perform add/get/remove/dump requests Enables third party application to use AUKS functionalities Client program (auks) Encapsulate API functions Enable scripted use M. Hautreux Kerberos Workshop 2010 10

AUKS - Overview Auks Features Auksd Stores TGT by uid (TGT principal to local uid conversion) Only one TGT per user Get requests by uid Automatic TGT renew mechanism libauksapi Automatic switch to backup server Configurable retries, timeout and delay between retries Simplify auks integration in external projects HA Active/Passive Rely on external tool (PaceMaker) Requires a shared FS M. Hautreux Kerberos Workshop 2010 11

AUKS - Overview Auks Daemon M. Hautreux Kerberos Workshop 2010 12

AUKS - Overview Auks authorization rules Defined by ACLs Based on Requester Kerberos principal Requester host Determine requesters role Guest : add request for own cred only User : add/get/remove for own cred only Admin : add/get/remove/dump for all creds Auks renew mechanism Implemented as a dedicated client Running as a daemon With admin Auks role Dumping credentials periodically and refreshing them when required M. Hautreux Kerberos Workshop 2010 13

AUKS - Overview Long running Jobs Users can periodically refresh their Auks TGT Performing a new add request i.e. Once a day, a week,... Users/Batch systems can renew TGTs using Auks Performing a get request (user/admin only) Automatically using refreshed TGTs Scalability in parallel jobs Based on addressless TGT Obtained and used during add request Single addressless credential per user Stored in Auks Memory Cache Provided to requesters without KDC interaction Forwarding to thousands of peers without KDC interaction M. Hautreux Kerberos Workshop 2010 14

AUKS - Overview Auks protocol example scenario Alice forwards her TGT to the Auks daemon Alice asks Bob to execute her request Bob asks Auks for Alice TGT Bob executes Alice request using her kerberos identity M. Hautreux Kerberos Workshop 2010 15

AUKS - Scalability 3 stages communication protocol Request/Reply/Acknowledgement Leave the TIME_WAIT TCP state on client side Improve server request processing sustained rate TIME_WAIT is 60s long on Linux for ~65k ports Sustained rate > 1100 req/s is not possible Replay cache management Enabled by default in kerberos API Uses a single file per user/application Sync file on disk at each addition Multiple threads -> Contention on replay cache Can be disabled on demand in Auks Clusters internal networks can often be considered trusted Greatly improves parallel kerberos communications Choice depending on parallelism requirements M. Hautreux Kerberos Workshop 2010 16

AUKS - Scalability Addressless versus Addressed TGTs Addressed tickets Requires a KDC interaction for each forwarding operation KDC is single threaded Auks sustained rate becomes KDC sustained rate (~dozens of TGT per second) Addressless tickets Not need to acquire a new TGT for each requester Sustained rate only limited by Auks internals Renew mechanism User/Admin Auks roles enable to get TGTs TGTs can thus be renewed using Auks Renew sustained rate only limited by Auks internals Fallback to default renew mechanism (KDC) In case of temporary Auks failure that would result in invalid credentials M. Hautreux Kerberos Workshop 2010 17

AUKS - Scalability results TestBed 1 server + 100 clients SuperMicro 6015TW-INF Bi-Socket Quad-Core (Intel Harpertown 2.8 GHz) 16 Go RAM SATA Intel 3 GBps controller Protocol 5 consecutives batchs of 16000 simultaneous requests (20 requests per core) Various quantity of workers With or without replay cache Add versus get requests Measure average number of requests per second M. Hautreux Kerberos Workshop 2010 18

AUKS - Scalability results 8000 Requests per second depending on Auks daemon workers quantity ( With and without replay cache) 7000 6000 Requests per second 5000 4000 3000 2000 1000 0 0 200 400 600 800 1000 1200 Workers quantity Requests per sec (without replay cache) Requests per sec (with replay cache) M. Hautreux Kerberos Workshop 2010 19

AUKS - Scalability results 8000 Requests per second by type depending on AUKS daemon workers quantity 7000 6000 5000 Requests number 4000 3000 2000 1000 0 0 200 400 600 800 1000 1200 workers quantity Get Requests per sec (without replay cache) Add Requests per sec (without replay cache) M. Hautreux Kerberos Workshop 2010 20

AUKS Possible ways of enhancement Global scalability by TGS prefetching Current known limitation TGS still acquired using TGT on each node Using basic kerberos API (scalability issue) TGS prefetching Store addressless TGTs and TGSs using Auks Daemon TGS to prefetch based on already acquired TGS and a configurable per principal list As many KDC requests as users multiply by number of different kerberized services + 1 Auks becomes a KDC caching system Addressed TGT support Better security but with far less scalability High-Availability Active-Active architecture M. Hautreux Kerberos Workshop 2010 21

AUKS Batch systems integration Pluggable integration in Slurm A highly scalable resources manager Open source, mainly developped at LLNL https://computing.llnl.gov/linux/slurm/ Auks plugin for Slurm Included in Auks tarball Do not provide Kerberos authentication Provide kerberos credential support and renewal Really small overhead in jobs launches Sustained rate up to 7000 req/sec of auksd ~1 seconds overhead for a thousand nodes submission Every user job extends running jobs kerberos lifetime Due to internal Auks refresh mechanism Easily integrated in Cron Using auks command line M. Hautreux Kerberos Workshop 2010 22

From HPC site to HPC Grid Environment HPC environment Kerberos authentication on workstations With a background renew mechanism GSI-SSH for HPC site remote connections On both workstations and cluster nodes Compiled without GSI features (kerberos GSSAPI) Offers cascading credentials refresh (Single point of renewal) NFSv4 + kerberos for remote FS (site centric) Provide NAS with enhanced security Could be replaced with OpenAFS + kerberos M. Hautreux Kerberos Workshop 2010 23

From HPC site to HPC Grid Environment Grid environment X509 PKI for user identities management Users own x509 certificates and associated keys GSI-SSH to access HPC sites gateways Compiled with GSI features (GSI GSSAPI) Offers cascading proxy certificates refresh (since GSI-SSH-4.8) PAM-PKINIT on HPC sites gateways Experimental pam module to get TGT from proxy certs using PKINIT Linked to GSI-SSH cascading refresh for TGT acquisition http://sourceforge.net/projects/pam-pkinit/ GSI-SSH for HPC site remote connections Compiled without GSI features (kerberos GSSAPI) Offers cascading credentials refresh Automatically use TGT acquired by PAM-PKINIT Benefit from PAM-PKINIT refresh stages Enables kerberized access to all the HPC site M. Hautreux Kerberos Workshop 2010 24

Questions? M. Hautreux Kerberos Workshop 2010 25