EGI User Forum Vilnius, 11-14 April 2011 SPINOSO Vincenzo Optimization of the job submission and data access in a LHC Tier2
Overview User needs Administration issues INFN Bari farm design and deployment Storage access optimization File system performance Performance over the WAN link Interactive jobs 2/28
Overview User needs Administration issues INFN Bari farm design and deployment Storage access optimization File system performance Performance over the WAN link Interactive jobs 3/28
User needs Grid submission Local submission Interactive facilities Code development, debugging Analysis with ROOT Personal research data Backups Editing Efficient I/O when serving analysis jobs Jobs may read from storage at 12MB/s Fast and reliable WAN transfers (SRM, GridFTP, Xrootd) 4/28
Admin issues Improving reliability and efficiency of the services provided Sharing and consolidation to avoid duplication of services Support to different VOs Support to different use cases Support to different types of access (grid, local, interactive) 5/28
Farm layout 6/28
Overview User needs Administration issues INFN Bari farm design and deployment Storage access optimization File system performance Performance over the WAN link Interactive jobs 7/28
Storage access Lustre StoRM POSIX parallel file system SRM layer on top of Lustre (CMS) Xrootd Alice production instance CMS test instance Different storage brands Different technologies (HW/SW RAIDs, RAID 5/6, FC, external SAS) 8/28
Storage pre-production Lustre 5 disk servers Network 4x 1Gbps each 190 TB ~600 concurrent jobs Result: 400MB/s RW 400MB/s 9/28
Storage in production CMS job robot efficiency is 95% 10/28
Storage in production 250TB used 10 servers 800 concurrent jobs Real ROOT analysis Result: up to 1.3GB/s (max) 1Gbps 1Gbps 11/28
Storage in production 500TB in production 15 servers Real user activity Result: concurrent reads up to 2GB/s (max) 12/28
Storage in production 650 TB in production 20 servers Real user activity Result: concurrent reads up to 2Gbps on average 2Gbps 13/28
CMS feedback from the grid IO performance tests (L. Sala) CMS walltime for the job CMSSW_CpuPercentage (UserTime/WallTime) Feedback CPU efficiency highly improved total execution time decreased 14/28
Overview User needs Administration issues INFN Bari farm design and deployment Storage access optimization File system performance Performance over the WAN link Interactive jobs 15/28
2Gbps WAN link 2Gbps 2Gbps BARI 16/28
Download from T1/T2 173 MB/s 17/28
Download from T2 145 MB/s 18/28
Upload to FNAL BARI FNAL: 237 MB/s 19/28
Xrootd tests Running ~50 jobs at TRIESTE (CMS T3) Jobs are reading data stored at Bari (remote access using XRootd) BARI TRIESTE 1Gbps spikes 20/28
Overview User needs Administration issues INFN Bari farm design and deployment Storage access optimization File system performance Performance over the WAN link Interactive jobs 21/28
Interactive jobs: why Classic interactive cluster issues maintenance issue (ad-hoc configuration, consistency) scalability performance degradation on heavy load different requirements by different use cases (even if coming from the same VO) Interactive access through interactive jobs The interactive submission is similar to the batch submission: the batch manager chooses one CPU to execute the job and returns an interactive shell The user will keep that CPU until releasing the interactive job (logout) Maintenance: one unique cluster provides both batch and interactive access; the environment is the same, no consistency issues Scalability: the interactive cluster can increase in size, dinamically, depending on the user requests Performance: one CPU per user, so the users are never sharing the same core 22/28
Interactive jobs: how Interactive jobs are provided by Torque as a functionality LFS has it as well The maui configuration is tuned a bit in order to guarantee high priority to those jobs A simple (custom) daemon guarantees the user that he will wait at most 60 seconds to get interactive access Interactive jobs can be logged out and re-logged in afterwards, using screen No hard limit on number of concurrent interactive sessions You can run also multicpu interactive jobs User can ask for n nodes, m processors per node 23/28
Interactive jobs AND the file system GOAL: we wanted one file system both for user and global data, for all the VOs on the site the file system had to be fast and POSIX compliant, in order to support interactive sessions just like a local filesystem the file system had to be shared on all the nodes of the farm, in order to allow both batch and interactive jobs to access both the user home directories and the globally available data stored on site needed a solution which allowed a warm upgrade of the disk space CHOICE: POSIX high performance cluster file system was preferred: Lustre. StoRM on top of Lustre to provide the SRM service 24/28
Interactive jobs example 1. Access to the frontend 2. Get a CPU 3. Use the CPU 4. Release the CPU 5. Release the frontend shell 25/28
Interactive jobs example 26/28
People involved Donvito Giacinto INFN, Università di Bari Spinoso Vincenzo INFN, Università di Bari Maggi Giorgio Pietro INFN, Politecnico di Bari 27/28
References Lustre Wiki StoRM http://storm.forge.cnaf.infn.it Xrootd http://wiki.lustre.org/index.php/main_page http://xrootd.slac.stanford.edu/ Interactive jobs using qsub http://www.clusterresources.com 28/28