SPARC 2 Consultations January-February 2016 1
Outline Introduction to Compute Canada SPARC 2 Consultation Context Capital Deployment Plan Services Plan Access and Allocation Policies (RAC, etc.) Discussion 2
Introduction to Compute Canada 3
Compute Canada (CC) An Effective Provider of Essential Digital Research Infrastructure Compute Canada, working through a federated partnership with regional organizations ACENET, Calcul Québec, Compute Ontario and WestGrid, leads the acceleration of research and innovation by deploying advanced research computing (ARC) systems, storage and software solutions. CC is a not-for-profit corporation. The membership includes most of Canada s major research universities. CC acts as a steward of Canada s ARC platform: Compute and storage resources, data centres Team of ~200 experts in utilization of ARC for research 100s of research software packages Cloud compute and storage (openstack, owncloud) National services CC is a proud ambassador for Canadian excellence in advanced research computing nationally and internationally. 4
Canada s ARC Platform Today & Tomorrow A Distributed Partnership Services Distributed Across Canada today 50 Systems 27 Data Centres 200,000 cores, 2 Pflops, 20 PB 200 Experts Consolidation & Concentration by 2018 5-10 Data Centres 300,000 cores, 12 Pflops, 50+ PB (Challenge 2) 200 Experts Continued Investment Required For Canadian Science to Compete Globally CANARIE and regional Networks 5
Member locations and new national hosting sites
Services Too... 7
Access and Allocations All Canadian faculty members have access to Compute Canada systems and can sponsor others in their name. Each system has resources set aside for users with default priority. No special vetting or application process required. Researchers with larger needs can apply to two different resource allocation competitions: RAC: 1-year, mostly individual faculty members RPP: up to 3-years, platforms and portals, shared datasets Storage is a dedicated allocation. Compute is a priority allocation. Allocation decisions made based on peer review. 8
Serving Researchers in all Disciplines 9
The Funding Model Figures for 2014-2015 Roughly $30M/year operating in 2014/15 Partner funding model ensures alignment of objectives. Capital and operating funded with the same model: 40% funded through the Canada Foundation for Innovation (MSI programme for operations, Cyberinfrastructure for capital) 60% from Universities, Provinces, other sources National leadership ensures strategic focus and accountability 10
SPARC 2 Consultation Context 11
Current Status - New Systems Coming Compute Canada received good news from CFI in July 2015. $30M in new infrastructure investments ($75M total project cost)! Some RFPs are already issued, new equipment is coming. New major systems to be deployed this year. However, many existing systems nearing (or past!) end-of-life. 2016-17 is about commissioning new systems while decommissioning old systems. Systems will be more powerful, # cores will not rise significantly. Storage capacity will increase dramatically. 12
Current Status - Times are Tight Demand continues to grow. 2016 competition just completed: 366 applications 16% increase in CPU ask (after correction) 34% increase in storage ask (after correction) 123% increase in GPU ask (after correction) New storage is coming soon, granted some delayed allocations. 42 projects (13%) that requested compute allocations were not awarded any compute allocation. 4% last year. (note: all are funded researchers) Average award: 57% of compute request (65% last year, 84% in 2012) 82% of storage request 19% of GPU request The 2017 competition will also be tough. 13
Funding Opportunities - 2016 and beyond Operating: Current operations funding (CFI MSI) expires March 31, 2017 CC (through Western University) has submitted an NOI for the next competition 2017-2022. Full CC MSI proposal due May 20, 2016 Capital: Currently purchasing infrastructure through CFI Cyberinfrastructure Initiative - Challenge-2, Stage-1. Expect to be fully deployed by end of 2017. Expect to be given opportunity to apply for additional capital funds in conjunction with MSI renewal proposal - May 20, 2016. Expect additional capital funding opportunities in connection with mid-term report on next MSI (likely required by spring 2020) The next 3-4 months are critical for planning Canada s ARC future through 2022! 14
Ways to Provide Feedback www.computecanada.ca/sparc2/ In person: Speak up in this meeting! Virtual - video conferenced consultations (Feb. 3, 22 in English) Via a White Paper Via a brief (5 minute) survey: www.surveymonkey.com/r/v59zdgv Via email (any time): sparc@computecanada.ca Note: 2014 White Paper responses from 20+ disciplinary organizations, universities and individuals had a strong influence on current technology plan. 15
White Papers Updates to 2014 SPARC v1 White Papers welcome! Introduction to your disciplinary use of ARC Status quo for utilization of current resources What challenges have you encountered with your use of the ARC that Compute Canada provides? What are your anticipated resource needs into the future (ideally, through 2022): Computation Storage Services Support What are some of the new technologies, services, support, etc., that you would like Compute Canada to investigate or provide? On what timeline? 16
White Papers - Guide Included on Website 17
SPARC Survey www.surveymonkey. com/r/v59zdgv 18
Technology Deployment Plan 19
Capital Planning Timeline CFI Challenge-2 Stage-1 (announced) $30M CFI investment announced, July 2015 2015: National Data Infrastructure RFP launched; deployment in 2016 2016: 3 new systems to be deployed 2017: 1 new system to be deployed, potentially 2 systems upgraded April 1, 2018 - spending complete CFI Challenge-2 Stage-2 (assumed for planning purposes) Deadline May 20, 2016. Decision September 2016 Site selection process underway now. 2017: first purchases April 1, 2020 - spending complete CFI Challenge-2, Stage-3 (assumed for planning purposes) Coincident with MSI mid-term review - 2019/2020 First spend in 2020/2021 (roughly replacement timeline for stage-1 purchases) 20
Capital Deployment Plan 2016-17 www.computecanada.ca/wp-content/uploads/2015/11/computecanada-technology-briefing-2015.pdf CC submitted a capital proposal to CFI in April 2015, including an investment plan for four national sites. Key components: Addresses pressing and urgent needs as older systems are defunded Concentrated investment in 4 large sites, national procurement process National Storage Architecture (60+PB of new storage) Greatly expanded cloud (OpenStack) capacity Greatly expanded accelerator (GPU) capacity Some heterogeneous systems with large memory (1TB+) nodes Note: In parallel, CFI has run a Challenge-1 competition. The investments in the CC capital deployment plan include infrastructure and tool development designed to support those projects. 21
Capital Deployment Plan 2016-17 www.computecanada.ca/wp-content/uploads/2015/11/computecanada-technology-briefing-2015.pdf Note: over the same time period we will be decommissioning an existing 82,000 CPU cores and a large fraction of existing disk storage. 22
Capital Deployment Plan 2016-17 www.computecanada.ca/wp-content/uploads/2015/11/computecanada-technology-briefing-2015.pdf 23
Capital Plan 2017-19 (Stage 2) The capital plan for Stage 2 will be built between now and May 20, 2016. CFI expected to require CC to propose 3 different technology options, with science justifications for each. Expectations: Addition of 1-3 new national sites Expansion of some existing national sites Expansion of national storage infrastructure Decisions need to be made: Balance of Large Parallel, General Purpose and Cloud? Emphasis on new architectures? Emphasis on accelerators? Memory per node? Services - Databases, storage platforms, private networks? 24
Services Plan 25
Compute Canada Services - Middleware We are service providers, not just infrastructure providers. The CC user base is broadening, bringing a broader set of needs. We have seen tremendous interest in services enabling Research Data Management (RDM) Through Challenge-1 and our Research Platforms and Portals competitions we have identified an additional list of middleware services CC will implement in common across our sites: Authentication and ID Management Data Transfer Software Distribution Monitoring (system status) Resource publishing (capacity available) 26
CC Services - Disciplinary Support Compute Canada expert research support is built around excellent local services - experts on your campus. In 2015 we augmented this through creation of our first national disciplinary support team - in digital humanities. Disciplinary support teams: encourage sharing of best practices across the country work on discipline specific documentation perform outreach to Canadian practitioners identify weakness in the support model or infrastructure plan with respect to each disciplinary group We are happy to take feedback on where you think more support is needed: Should we create a new team in a certain area? Should the list of responsibilities above (per team) be expanded? 27
CC Services - Research Support Currently, expert support is generally: local (on campus) short-term (days, not months) We get requests for long term (embedded) research support. Currently offered on a competitive basis in some regions but not a national service. Should we offer embedded (long term) support? On what basis? Paid? Competitive? 28
CC Services - Training Compute Canada current offers training across the country: Code optimization Use of specific hardware platforms or software services Basic and advanced HPC techniques Most training is local/regional. Local courses offered by local staff. National initiatives include: National partnership with Software Carpentry International partner in International HPC Summer School Discussions with Data Carpentry We welcome feedback on training emphasis. Where are the gaps today? 29
CC Services - Security and Privacy More and more ARC is being used to do research involving personal info (e.g., health, social sciences, industry data). Policies must be in place to protect personal information. Physical and network security must be in place to protect held on CC systems. Data isolation has to be assured for special projects that require it. CC has adopted a new security framework - the ISMS follows ISO/IEC 27001 (operations and standards ISO/IEC 27002). Minimum standard in all CC data centres, some will be designated for higher security data sets. New network, storage design to support data isolation. 30
Access and Allocation Policies 31
CC Access Policy The current access policy is organized by sponsor. CC approves the sponsor, the sponsor approves any and all group members. Group members can be students, postdocs, external collaborators, etc. All usage charged to the sponsor. There is no fee for usage charged to Canadian university faculty. When the sponsor is from private industry, all usage is subject to a fee. When the sponsor is from a federal laboratory or other not-for-profit, a reduced fee applies. Teaching is not an eligible use (though training is). Has this policy ever been an impediment to your research? Suggestions? 32
CC Resource Allocation Policies All users have default access to each CC system (compute, storage). Users can apply for special resource allocations for: Compute (priority in shared system, in core-years) Storage (dedicated, short-term or long term) Cloud resources (virtual machines, public IP addresses, etc.) CC allocates about 80% of the available core-years each year through competitive processes. This leaves up to 20% for default access. Two categories of competition, one competition period per year: RAC: generally single investigator projects Research Platforms and Portals (RPP): shared datasets, possible multi-year allocations 33
CC Resource Allocation Policies Competition is based on peer review: Technical review to correct asks 7 disciplinary panels (78 panelists this year) multiple independent reviews per proposal panel review meeting to set science score multidisciplinary panel review of (about 30) largest proposals If panel process does not result in a balanced budget, CC applies scaling function based on science score from panel process. 2016 example (compute): Default Priority 34
CC Resource Allocation Questions Competition frequency: once per year plus ad-hoc out-of-round enough? Award duration: single year with fast track long enough? Note that CC must report every year, so progress report always needed. CCV introduced for 2016 competition. How can we improve the CCV experience? Compute scaling based on science score. Alternatives: rank-and-cut, different function shape? The connection between tri-council research grants and CC resource allocations means that successful grant recipients still need to apply for Compute Canada resources. Double jeopardy unavoidable? We are sometimes asked if users can contribute additional resources. Is there significant demand to provide a price-list? 35
Ways to Provide Feedback www.computecanada.ca/research-portal/feedback/sparc2/ In person: Speak up in this meeting! Virtual - video conferenced consultations (Feb. 3, 22 in english) Via a White Paper Via a brief (5 minute) survey: www.surveymonkey.com/r/v59zdgv Via email (any time): sparc@computecanada.ca An Aside: Account renewal is coming in March. Intend to collect CCVs. 36
Thank You! 37
At the Limit of Our Capacity 38
Projecting Increased Compute Demand Based on SPARC whitepaper projections (research roadmaps) White Paper Predicted Increase from Current to 2020 Numerical Relativity 3x Subatomic Physics 3x Materials Research 5x Canadian Genome Centres 8x Canadian Astronomical Society 10x Theoretical Chemistry 12x Also projected: Clear need for accelerators. Clear need for mix of memory sizes. 39
Projecting Compute Demand: 7x / 5 years Averaging SPARC whitepaper projections (research roadmaps) 40
Projecting Storage Demand: 15x / 5 years Averaging SPARC whitepaper projections (research roadmaps) 41