Bill Boroski LQCD-ext II Contractor Project Manager

Similar documents
FY17/FY18 Alternatives Analysis for the Lattice QCD Computing Project Extension II (LQCD-ext II)

BNL FY17-18 Procurement

arxiv: v1 [hep-lat] 13 Jun 2008

DoD Environmental Security Technology Certification Program (ESTCP) Tim Tetreault DoD August 15, 2017

Progress Report. Project title: Resource optimization in hybrid core networks with 100G systems

State of Florida Enterprise

DHS Overview of Sustainability and Environmental Programs. Dr. Teresa R. Pohlman Executive Director, Sustainability and Environmental Programs

GRID MODERNIZATION INITIATIVE PEER REVIEW

Implementation Status & Results Montenegro Energy Community of South East Europe APL 3 - Montenegro Project (P106899)

SUBJECT: PRESTO operating agreement renewal update. Committee of the Whole. Transit Department. Recommendation: Purpose: Page 1 of Report TR-01-17

MINUTES COMMITTEE ON GOVERNANCE Conference Call April 7, 2010

Information Technology Services. Informational Report for the Board of Trustees October 11, 2017 Prepared effective August 31, 2017

Scientific Computing at SLAC. Amber Boehnlein

Introduction. Amber Boehnlein July, Sci Comp SLAC Scientific Computing Workshop 1

LQCD Computing at BNL

Organizational Update: December 2015

Working Together to Create Sustainable Success. APICS Board of Directors Meeting Update April 2012

Council, 26 March Information Technology Report. Executive summary and recommendations. Introduction

Interim Report Technical Support for Integrated Library Systems Comparison of Open Source and Proprietary Software

LQCD Facilities at Jefferson Lab. Chip Watson May 6, 2011

Big Computing and the Mitchell Institute for Fundamental Physics and Astronomy. David Toback

Public Disclosure Copy

Figure 1: Summary Status of Actions Recommended in June 2016 Committee Report. Status of Actions Recommended # of Actions Recommended

12 Approval of a New PRESTO Agreement Between York Region and Metrolinx

Project Overview and Status

NAC Institutional Committee Meeting

NC Education Cloud Feasibility Report

Intro To WestGrid. WestGrid Seminar Series Jill Kowalchuk, WestGrid CAO September 17, 2008 Access Grid / Webcast.

What future changes are planned to improve the performance and reliability of the Wairarapa Connection?

Council, 8 February 2017 Information Technology Report Executive summary and recommendations

PISMO BEACH COUNCIL AGENDA REPORT

Governing Body 313th Session, Geneva, March 2012

arxiv: v1 [hep-lat] 1 Dec 2017

Information technology statement of directions

PrairieCat Governing Bodies, Committees and Meetings for FY2019

Library. Summary Report

DEPARTMENT OF THE ARMY *ER U.S. Army Corps of Engineers CECW-EC Washington, DC Administration PROJECT SCHEDULES

Avoid a DCIM Procurement Disaster

The Project Charter. Date of Issue Author Description. Revision Number. Version 0.9 October 27 th, 2014 Moe Yousof Initial Draft

Facilities. Western Illinois University Quad Cities. Consolidated Fiscal-Year 2011 Annual Report and Fiscal-Year 2012 Planning Document

Corporate IT Survey Messaging and Collaboration, Editor: Sara Radicati, Ph.D

Federal Data Center Consolidation Initiative (FDCCI) Workshop III: Final Data Center Consolidation Plan

ISACA MADRID DECEMBER Robert E Stroud CEGIT CRISC International President December 2014

ERS IT Portfolio Report

Department Heads Meeting December 17, 2009

MOTION NO. M Contract Amendment for Technology Software, Hardware, and Related Maintenance Services

Freedom of Information Act 2000 reference number RFI

PROFESSIONAL CERTIFICATES AND SHORT COURSES: MICROSOFT OFFICE. PCS.uah.edu/PDSolutions

San Francisco Housing Authority (SFHA) Leased Housing Programs October 2015

AEC Broadband Update 2018

SPARC 2 Consultations January-February 2016

CIP Standards Update. SANS Process Control & SCADA Security Summit March 29, Michael Assante Patrick C Miller

2017 Resource Allocations Competition Results

GENERATOR ADDENDUM TO MAGNOLIA GARDENS ASSISTED LIVING COMPREHENSIVE EMERGENCY MANAGEMENT PLAN

July 30, Q2 Quarterly Report on Progress in Processing Interconnection Requests; Docket No. ER

INFORMATION TECHNOLOGY ONE-YEAR PLAN

Budget Review Process (BRP) Preliminary List of Business Initiatives. Stakeholder Meeting April 10, 2017

Proposed Increase In Rates In Water, Sewer and Reclaimed. June 9, 2009

BEFORE THE ARKANSAS PUBLIC SERVICE COMMISSION ) ) ) ) ) ) ) ) PHASE II REBUTTAL TESTIMONY KURTIS W. CASTLEBERRY DIRECTOR, RESOURCE PLANNING

Office of Acquisition Program Management (OAPM)

CompTIA Project+ (2009 Edition) Certification Examination Objectives

Chapter 8: SDLC Reviews and Audit Learning objectives Introduction Role of IS Auditor in SDLC

Transforming Accenture s IT While Reducing Cost Robert E. Kress, COO, Internal IT May 2011

FY18 FULL-YEAR RESULTS 31 AUGUST 2018 NEXTDC LIMITED ACN

TX CIO Leadership Journey Texas CIOs Bowden Hight Texas Health and Human Services Commission Tim Jennings Texas Department of Transportation Mark

MISO PJM Joint and Common Market Cross Border Transmission Planning

USGv6: US Government. IPv6 Transition Activities 11/04/2010 DISCOVER THE TRUE VALUE OF TECHNOLOGY

APNIC Update. German Valdez External Relations Program Director, APNIC ARIN XXX 26-October-2012

INSPIRE status report

SME License Order Working Group Update - Webinar #3 Call in number:

LRI - Land Registers Interconnection 3

Overview of the USGS Plan for Quality Assurance of Digital Aerial Imagery

ITD SERVER MANAGEMENT PROCEDURE

The Computation and Data Needs of Canadian Astronomy

NJ Energy Master Plan. NJ Energy Master Plan Energy Efficiency Strategy: Draft Procurement Structure. Presented to: Energy Efficiency Stakeholders

e-sens Nordic & Baltic Area Meeting Stockholm April 23rd 2013

UNCLASSIFIED. UNCLASSIFIED R-1 Line Item #49 Page 1 of 10

DATE OF BIRTH SORTING (DBSORT)

Steve Ashbaker Director Operations. NASPI Workgroup Meeting February 21, 2013 Atlanta, GA

E-rate Program Applicant Training Fiber Options

Comprehensive Professional Energy Services Blanket Purchase Agreements Ordering Guide

Transform your Microsoft Project schedules into presentation reports with Milestones Professional.

Compliance Enforcement Initiative

Module 3. Overview of TOGAF 9.1 Architecture Development Method (ADM)

Cloud Transformation Program Cloud Change Champions May 23, 2018

Regional TSM&O Vision and ITS Architecture Update

Portlet Reference Guide. Release

Update on our neutrino projects: LBNF/DUNE and SBN

Content Creation & Dissemination Team Recommendations for Annual Goals October 17, 2014

Use Case Study: Reducing Patient No-Shows. Geisinger Health System Central and Northeastern Pennsylvania

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

ERO Enterprise Strategic Planning Redesign

REPORT TO THE 2012 LEGISLATURE SEMI-ANNUAL REPORT ON HAWAI I CANCER RESEARCH SPECIAL FUND HRS 304A-2168

Information Systems Analysis and Design CSC340. XXV. Other Phases

CIMA Certificate BA Interactive Timetable

Schematic Design Overview Project Review Stakeholder Engagement Project Delivery Method Upcoming Milestones

OCM ACADEMIC SERVICES PROJECT INITIATION DOCUMENT. Project Title: Online Coursework Management

NextGen Update. April Aeronautics and Space Engineering Board

LXI Technical Committee Overview

DRAFT. IT Services Strategic Plan

Transcription:

Bill Boroski LQCD-ext II Contractor Project Manager boroski@fnal.gov Robert D. Kennedy LQCD-ext II Assoc. Contractor Project Manager kennedy@fnal.gov USQCD All-Hands Meeting Jefferson Lab April 28-29, 2017

LQCD-ext II progress to date Updates to our baseline operations plan Organizational changes Planning for the annual DOE review FY17 hardware acquisition activities FY18 acquisition plans User survey results Boroski / Kennedy, Report from the Project Office, All-Hands Meeting, Apr 28-29, 2017 2

We re in the third year of the 5-year extension (Oct 2014-Sep 2019) We ve received $8M of our planned $14M in funding (57%), in accordance with our baseline funding profile ($2M in FY15; $3M in FY16, $3M in FY17). The computing we ve delivered to the collaboration through March 2017 continues to exceed our baseline goals (TF-yrs delivered). Goal FY17 1 Actual Cumulative (Oct 15 Thru Mar 17 % of Goal Goal Actual % of Goal Conventional 68.2 73.4 108% 257.9 279.2 108% Resources 2 Accelerated 40.9 43.5 106% 257.5 269.5 105% Resources 3 1) FY17 performance through March 2017. 2) Conventional resources operational in FY17: Bc, Pi0,12s, 16p, BG/Q, 10% of DD2 prototype BG/Q rack (Bs retired Dec 2016) 3) Accelerated resources operational in FY17: Pi0g, 12k, (10g and 11g retired Dec 2016, BNL-IC brought online Jan 2017). Boroski / Kennedy, Report from the Project Office, All-Hands Meeting, Apr 28-29, 2017 3

FY17 data for conventional resources are shown. Goals are being exceeded because of excellent uptime at all three sites and running Ds beyond planned retirement date. The uptime goal is 8000 hours per year (91.3%). Performance goal is based on an average of the sustained performance of domain wall fermion (DWF) and highly improved staggered quark (HISQ) algorithms FY17 data for accelerated clusters is shown. Goals are being exceeded due to excellent uptime at all three sites and running Dsg, 10g and 11g beyond planned retirement dates. The uptime goal is 8000 hours per year (91.3%). Conversion from GPU-hrs. to effective TF-yrs is 140 GF/GPU, based on allocation-weighted performance of GPU projects running from July 1, 2012 through Dec 2012. Boroski / Kennedy, Report from the Project Office, All-Hands Meeting, Apr 28-29, 2017 4

Site Operations (CR16-01) Baseline operations plan called for cluster hosting at FNAL and JLab through Sep 2019, and operation of the BG/Q half-rack at BNL through Sep 2017. Change Request 16-01 was approved by Change Control Board (CCB) and Federal Project Director as required. BNL began delivering cluster computing resources in Jan 2017. BNL will purchase, deploy and operate new LQCD clusters in future years (planning for the FY17 acquisition is in process). Performance Goals (CR16-02) The approved baseline defined performance goals separately for conventional and GPU-accelerated machines. New computing architectures required us to redefine and combine these performance goals. New MIC technologies do not neatly fit into either category, constraining the computing project to only invest in Conventional and Accelerated Computing at a certain level each year in order to be judged successful. Change Request 16-02 was approved by the CCB and Federal Project Director as required. Boroski / Kennedy, Report from the Project Office, All-Hands Meeting, Apr 28-29, 2017 5

Organizational changes: We have included Ted Barnes (DOE-ONP) to acknowledge the very active role he continues to play on the project. Alexandr Zaytsev has replaced Shigeki Misawa as co-site Architect at BNL. Boroski / Kennedy, Report from the Project Office, All-Hands Meeting, Apr 28-29, 2017 6

2017 Annual Review scheduled for May 16-17 at Fermilab Review charge very similar to previous years. Continued significance and relevance of the LQCD-ext II project, with an emphasis on its impact on the experimental programs support by OHEP and ONP Progress towards scientific and technical milestones Status of technical design and proposed technical scope for FY17 Feasibility and completeness of proposed budget & schedule Responsiveness to recommendations from last year s review Effectiveness of USQCD in allocating LQCD-ext II resources to its community of lattice theorists but with a formal request for USQCD to present its plans for further capacity computing Will USQCD be requesting a further extension of the IT hardware project beyond FY19? If so, what is the status of a whitepaper presenting the research plan? If not, what are the plans for ramping down the current project? Boroski / Kennedy, Report from the Project Office, All-Hands Meeting, Apr 28-29, 2017 7

Boroski / Kennedy, Report from the Project Office, All-Hands Meeting, Apr 28-29, 2017 8

Plan Name FY16 FY17 Deployments FY18 Deployments FY19 Deployment Former Baseline JLab JLab (FY16 options) FNAL FNAL (FY18 options) 3-Site Cluster Hosting Baseline JLab 1/3 JLab (FY16 options) 2/3 BNL 2/3 BNL (FY17 options) 1/3 FNAL (initiate procurement) FNAL (execute procurement) 3-Site Cluster Hosting revised Acquisition Schedule Split 4 acquisition budget years across 3 sites Constraint: Maintain same level of delivered computing 40-node allocation on BNL-IC (K80 GPUs) Production 1/4/2017. Allocation through end FY19 40 nodes is time-averaged. Can be more or less anytime. Not traditional acquisition, but adds computing to portfolio Also, implementing access to storage, tape archive there Boroski / Kennedy, Report from the Project Office, All-Hands Meeting, Apr 28-29, 2017 9

JLab: ~1/3 of Computing Acquisition Funds Options purchase based on FY16 acquisition contract. Expanded 16p to 256 KNL nodes (plus spares) very early in FY17. BNL: ~2/3 of Computing Acquisition Funds Led by Bob Mawhinney, Alex Zaytsev. Details: Bob M s Saturday talk Acquisition team working with Acquisition Review Committee FY17 Acquisition Review Committee formed earlier this year Review proposed FY17 (BNL) computing hardware acquisition plan Chair: Rob Kennedy Focus: develop more USQCD-specific software benchmarks for RFP process Members include Site Architects, Site Managers, Collaboration Reps: Carleton Detar, Steve Gottlieb, Chulwoo Jung, James Osborn, Frank Winter Draft report available May 17. Early Notables from Acquisition team: Target job size range: jobs using up to ~16 nodes Dual-rail with KNL is not cost-effective vs Single-rail KNL for target job sizes SPC: much higher over-request % for CPU and KNL than for GPUs Boroski / Kennedy, Report from the Project Office, All-Hands Meeting, Apr 28-29, 2017 10

BNL: ~2/3 of Computing Acquisition Funds Options purchase based on FY17 acquisition contract. Most likely, this will lead to more of the FY17 choice. FNAL ~1/3 of Computing Acquisition Funds Hold this portion of FY18 funds for a purchase in FY19. Initiate the FY18-FY19 acquisition process in FY18. Take as far as possible without FY19 funds on hand. FY19 Funds arrive: FNAL executes FY18/19 RFP ASAP for early deployment of FY19 computing. Plans for FY20 and later operations may impact this. Boroski / Kennedy, Report from the Project Office, All-Hands Meeting, Apr 28-29, 2017 11

Boroski / Kennedy, Report from the Project Office, All-Hands Meeting, Apr 28-29, 2017 12

The FY16 User Survey: Measured user satisfaction from October 2015 through September 2016 Survey open from through December 16, 2016 to March 10, 2017 Same format as in recent years, 29 questions designed to measure satisfaction with: LQCD Compute Facilities USQCD Resource Allocation Process The User Survey was distributed to all scientific members of USQCD Responses were received from 73 individuals vs. 66 in FY15 26 of 27 PI s responded: 96% response rate vs. 86% in FY15 33 of 50 most Active Users responded: 66% response rate vs. 50% in FY15 FY16 overall satisfaction rating with Compute Facilities = 93% Exceeds LQCD Computing Project KPI goal of 92%. Was 97% in FY15. FY16 overall satisfaction rating with Resource Allocation Process = 85% Down from FY15 s rating (91%). At the level in FY11,12,14 (ratings in mid-80 s). Boroski / Kennedy, Report from the Project Office, All-Hands Meeting, Apr 28-29, 2017 13

User Comment Topics: suggested by >= 2 user comments LQCD: User Documentation at BNL, JLab action plan documented LQCD: Simplify Moving Projects from Site to Site - discussing USQCD: Concern about turn-around time for Class B, C proposals discussing USQCD: Link between science priorities, top allocations, outcomes discussing User Survey Report: near-final draft but not final yet. Please, talk to Bill or Rob at break if you have comments. Still time to provide input to report. And you can always send email to Bill or Rob do not have to wait for annual survey. Boroski / Kennedy, Report from the Project Office, All-Hands Meeting, Apr 28-29, 2017 14

Boroski / Kennedy, Report from the Project Office, All-Hands Meeting, Apr 28-29, 2017 15

Please plan how to preserve your data after your allocation ends. Project developed a Policy on Data Preservation Sites may implement policy a little differently to adapt to local environment Data Preservation Policy for Disk Storage Disk data that is not covered by a storage allocation and not community-owned can be moved to tape after 1 month from the end of your allocation at a site s discretion unless prior arrangements have been made. Data Preservation Policy for Tape Storage Tape data that is not community-owned and not used for 3 years after your allocation ends can be shelved at a site s discretion unless prior arrangements have made. Related: Managing Data Storage Sites fit current project allocations AND community data in available resources. Community Data status as defined by USQCD-EC. Policy empowers sites to clean up data from past allocations that consumes resources Sites have had to scale down past disk, tape allocations to fit available resources More tape and/or disk storage resources = less CPU and/or memory resources. Boroski / Kennedy, Report from the Project Office, All-Hands Meeting, Apr 28-29, 2017 16

Boroski / Kennedy, Report from the Project Office, All-Hands Meeting, Apr 28-29, 2017 17