HP Storage and Computing @ UMCG Pieter Neerincx Genomics Coordination Center UMCG SURF-DTL SIG Compute for life science reseh April 22 2015 Utrecht 1
Topics Expectation Management Shared lab / kitchen / cluster = shared responsibility Disaster Recovery Failover vs. Fallback Data Management Dependency Management
Disaster Recovery Failover vs. Fallback PBS/Torque PBS/Torque SLURM scheduler scheduler scheduler 10 nodes 5 nodes 5 nodes UI servers 10 nodes storage homes storage HA storage HP tmp storage HP tmp storage HP tmp GPFS GPFS Lustre
Data Management: Why There are 10 kinds of people: Those who have lost data and those who will loose data Make backups Backup window Restore window Costs
Data Management: Why Traceability Reproducibility Continuity December 2015: All reseh @ UMCG ISO9001 certified Volatile CPUs Mem Network Non-volatile Storage
user 1 home
user 1 home -- Never ++++ Never ++ Never +++++ Daily: 3+ months old Quota Backup Auto Clean
home user 1 SFTP UI Nodes
Data Manager Policy differs per group DM reviews documented data when moving to / No DM One dedicated DM Everybody is also DM Everybody is also DM, but you are not allowed to review your own documented data
Dependency Management Runtime environment modules / Lmod / other implementations Modifies environment Analysis scripts No hardcoded paths to software No environment variables for software defined (only used) Portable Deploytime (Download-Decompress-Compile-Install-time) EasyBuild
Runtime DepMan: modules Lmod (Lua implementation of modules) Texas Advanced Computing Center $> module avail GATK ----------------------- ///modules/ ----------------------- GATK/2.7-4-g6f46d11 GATK/2.8-1-g932cd3a $> module load GATK/2.7-4-g6f46d11 $> module list Currently Loaded Modulefiles: 1) /jdk/1.7.0_25 2) /R/3.0.2 3) /GATK/2.7-4-g6f46d11
Runtime DepMan: modules $> module show GATK/2.7-4-g6f46d11 --------------------------------------------------------------- ///modules//gatk/2.7-4-g6f46d11: module-whatis Sets GATK environment. prereq jdk/1.7.0_25 setenv GATK_HOME ////GATK-2.7-4-g6f46d11/ --------------------------------------------------------------- $> java -jar ${GATK_HOME}/GenomeAnalysisTK.jar --version GATK-2.7-4-g6f46d11
Deploytime DepMan: EasyBuild HPC UGent hpcugent.github.io/easybuild/ EasyBuild Framework (Python) EasyBlocks EasyConfigs
Deploytime DepMan: EasyBuild Toolchain example: goolf-1.7.20 (GCC OpenMPI OpenBlas LAPACK FFTW) Install example: eb BWA-0.7.12-goolf-1.7.20.eb
Deploytime DepMan: EasyBuild EasyBuild automates Fetch sources Decompress Configure Compile Install Generate module file No root access required Large collection of EasyConfigs shared by community
home user 1 1. Modify/upload your personal configs/preferences You
home user 1 1. Modify/upload your personal configs/preferences You 2. Perform experiment
home user 1 1. Modify/upload your personal configs/preferences You 2. Perform experiment 3. Generate raw data
home user 1 1. Modify/upload your personal configs/preferences You 2. Perform experiment 3. Generate raw data 4. Document raw data www.nature.com/scientificdata/
home user 1 1. Modify/upload your personal configs/preferences You 2. Perform experiment 3. Generate raw data 4. Document raw data 5. Upload documented raw data
home user 1 1. Modify/upload your personal configs/preferences DM 6. Contact Data Manager You 2. Perform experiment 3. Generate raw data 4. Document raw data 5. Upload documented raw data
home user 1 1. Modify/upload your personal configs/preferences DM 6. Contact Data Manager You 7. Move or copy documented raw data 2. Perform experiment 3. Generate raw data 4. Document raw data 5. Upload documented raw data
user 1 home
user 1 home 8. Copy raw data You
user 1 home 8. Copy raw data You 9. Analyze data
user 1 home 8. Copy raw data You 9. Analyze data 10. Generate tmp data 11. Generate final results
user 1 home 8. Copy raw data You 9. Analyze data 10. Generate tmp data 11. Generate final results 12. Document final results
user 1 home 8. Copy raw data You 13. Contact Data Manager DM 9. Analyze data 10. Generate tmp data 11. Generate final results 12. Document final results
user 1 home 14. Move documented final results 8. Copy raw data You 13. Contact Data Manager DM 9. Analyze data 10. Generate tmp data 11. Generate final results 12. Document final results
user 1 home 14. Move documented final results 8. Copy raw data You 13. Contact Data Manager DM 9. Analyze data 15. Cleanup 10. Generate tmp data 11. Generate final results 12. Document final results
user 1 home
www.molgenis.org
www.molgenis.org
?