Fast computers, big/fast storage, fast networks Marla Meehl Manager, Network Engineering and Telecommunications, NCAR/UCAR, Manager of the Front Range GigaPoP Computational & Information Systems Laboratory National Center for Atmospheric Research
History of Supercomputing at NCAR Production Systems new system @ NWSC IBM p6 p575/4096 (bluefire) Experimental/Test Systems that became Production Systems IBM p6 p575/192 (firefly) IBM p5+ p575/1744 (blueice) Experimental/Test Systems IBM p5 p575/624 (bluevista) Aspen Nocona-IB/40 (coral) IBM BlueGene-L/8192 (frost) Red text indicates those systems that are currently in operation within the IBM BlueGene-L/2048 (frost) NCAR Computational and Information Systems Laboratory's computing facility. IBM e1350/140 (pegasus) IBM e1350/264 (lightning) IBM p4 p690-f/64 (thunder) IBM p4 p690-c/1600 (bluesky) IBM p4 p690-c/1216 (bluesky) SGI Origin 3800/128 (tempest) Compaq ES40/36 (prospect) IBM p3 WH2/1308 (blackforest) IBM p3 WH2/604 (blackforest) IBM p3 WH1/296 (blackforest) Linux Networx Pentium-II/16 (tevye) SGI Origin2000/128 (ute) Cray J90se/24 (chipeta) HP SPP-2000/64 (sioux) Marine St. Mesa Lab I Mesa Lab II NWSC CDC 6600 CDC 3600 CDC 7600 Cray 1-A S/N 3 (C1) Cray C90/16 (antero) Cray J90se/24 (ouray) Cray J90/20 (aztec) Cray J90/16 (paiute) Cray T3D/128 (T3) Cray T3D/64 (T3) Cray Y-MP/8I (antero) CCC Cray 3/4 (graywolf) IBM SP1/8 (eaglesnest) TMC CM5/32 (littlebear) IBM RS/6000 Cluster (CL) Cray Y-MP/2 (castle) Cray Y-MP/8 (shavano) TMC CM2/8192 (capitol) Cray X-MP/4 (CX) Cray 1-A S/N 14 (CA) 35 systems over 35 years 27 Oct '09 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
Current HPC Systems at NCAR System Name Vendor System # Frames # Proces sors Processor Type GHz Peak TFLOPs Production Systems bluefire (IB) IBM p6 Power 575 11 4096 POWER6 4.7 77.00 Research Systems, Divisional Systems & Test Systems Firefly (IB) IBM p6 Power 575 1 192 POWER6 4.7 3.610 frost (BG/L) IBM BlueGene/L 4 8192 PowerPC-440 0.7 22.935
High Utilization, Low Queue Wait Times 100.0% Bluefire System Utilization (daily average) Bluefire system utilization is routinely >90% Average queue-wait time < 1hr 90.0% 80.0% 70.0% 60.0% 50.0% 40.0% 30.0% Spring ML Powerdown May 3 Firmware & Software Upgrade 20.0% 14000 10.0% # of Jobs 12000 10000 8000 Premium Regular Economy Standby 0.0% Nov-08 Dec-08 Jan-09 Feb-09 Mar-09 Apr-09 May-09 Jun-09 Jul-09 Aug-09 Sep-09 Oct-09 Nov-09 6000 4000 2000 Average Queue Wait Times for User Jobs at the NCAR/CISL Computing Facility 0 <2m <5m <10m <20m <40m <1h Queue-Wait Time <2h <4h <8h <1d <2d >2d Queue 4 bluefire lightning blueice bluevista since Jun'08 since Dec'04 Jan'07-Jun'08 Jan'06-Sep'08 Peak TFLOPs > 77.0 TFLOPs 1.1 TFLOPs 12.2 TFLOPs 4.4 TFLOPs all jobs all jobs all jobs all jobs Premium 00:05 00:11 00:11 00:12 Regular 00:36 00:16 00:37 01:36 Economy 01:52 01:10 03:37 03:51 Stand-by 01:18 00:55 02:41 02:14
IP Bluefire Usage Climate 49.3% Accelerated Scientific Discovery 18.0% Weather 10 Upper Atmosphere 0.3% Miscellaneous 0.3% Cloud Physics 1.3% Basic Fluid Dynamics 2.6% Oceanograph 4.0% Atmospheric Chemistry 3.5% Astrophysics 4.4%
CURRENT NCAR COMPUTING >100TFLOPS Peak TFLOPs at NCAR (All Systems) 100 ICESS Phase 2 IBM POWER6/Power575/IB (bluefire) IBM POWER6/Power575/IB (firefly) IBM POWER5+/p575/HPS (blueice) 80 IBM POWER5/p575/HPS (bluevista) IBM BlueGene/L (frost) 60 IBM Opteron/Linux (pegasus) IBM Opteron/Linux (lightning) ICESS Phase 1 IBM POWER4/Federation (thunder) 40 ARCS Phase 3 ARCS Phase 4 bluefire IBM POWER4/Colony (bluesky) IBM POWER4 (bluedawn) 20 ARCS Phase 2 Linux blueice SGI Origin3800/128 ARCS Phase 1 lightning/pegasus frost bluevista firefly bluesky blackforest 0 Jan-00 Jan-01 Jan-02 Jan-03 Jan-04 Jan-05 Jan-06 Jan-07 Jan-08 Jan-09 Jan-10 IBM POWER3 (blackforest) IBM POWER3 (babyblue)
CISL Computing Resource Users 1347 Users in FY09 Other, 13, 1% Univ Research Associates, 280, 21% Univ Faculty, 191, 14% Government, 108, 8% NCAR Assoc Sci & Support Staff, 252, 19% NCAR Scientists & Proj Scientists, 160, 12% Graduate and Undergrad Students, 343, 25%
120 Universities with largest number of users in FY09 100 Number of Users 80 60 40 20 Users are from 114 U.S. Universiti 0
Disk Storage Systems Current Disk Storage for HPC and Data Collections: 215 TB for Bluefire (GPFS) 335 TB for Data Analysis and Vis (DAV) (GPFS) 110 TB for Frost (Local) 152 TB for Data Collections (RDA, ESG) (SAN) Undergoing extensive redesign and expansion. Add ~2PB to (existing GPFS system) to create large central (xmounted) filesystem
Data Services Redesign Near-term Goals: Creation of unified and consistent data environment for NCAR HPC High-performance availability of central filesystem from many projects/systems (RDA, ESG, CAVS, Bluefire, TeraGrid) Longer-term Goals: Filesystem cross-mounting Global WAN filesystems Schedule: Phase 1: Add 300TB (now) Phase 2: Add 1 PB (March 2010 ) Phase 3: Add.75 PB (TBD)
Mass Storage Systems NCAR MSS (Mass Storage System) In Production since 70s Transitioning from the NCAR Mass Storage System to HPSS Prior to NCAR-Wyoming Supercomputer Center (NWSC) commissioning in 2012 Cutover Jan 2011 Transition without data migration Silos/Tape Drives Tape Drives/Silos Sun/StorageTek T10000B (1 TB media) with SL8500 libraries Phasing out tape silos Q2 2010 Oozing 4 PBs (unique), 6 PBs with duplicates Online process Optimized data transfer Average 20 TB/day with 10 streams
Estimated Growth 35.00 Total PBs Stored (w/ DSS Stewardship & IPCC AR5) 30.00 25.00 20.00 15.00 10.00 5.00 0.00 Oct'07 Oct'08 Oct'09 Oct'10 Oct'11 Oct'12
Facility Challenge Beyond 2011 NCAR Data Center NCAR data center limits of power/cooling/space have been reached Any future computing augmentation will exceed existing center capacity Planning in progress Partnership Focus on Sustainability and Efficiency Project Timeline
NCAR-Wyoming Supercomputer Center (NWSC) Partners NCAR & UCAR University of Wyoming State of Wyoming Cheyenne LEADS Wyoming Business Council Cheyenne Light, Fuel & Power Company National Science Foundation (NSF) University of Wyoming Cheyenne http://cisl.ucar.edu/nwsc/ NCAR Mesa Lab
Focus on Sustainability Maximum energy efficiency LEED certification Achievement of the smallest possible carbon footprint Adaptable to the ever-changing landscape of High- Performance Computing Modular and expandable space that can be adapted as program needs or technology demands dictate
Focus On Efficiency Don t fight mother nature The cleanest electron is an electron never used The computer systems drive all of the overhead loads NCAR will evaluate in procurement Sustained performance / kw Focus on the biggest losses Compressor based cooling UPS losses Transformer losses 65.9% Typical Modern Data Center 0.1% 15.0% 6.0% 13.0% NWSC Design 91.9% Lights Cooling Fans Electrical Losses IT Load Lights Cooling Fans Electrical Losses IT Load
Design Progress Will be one of the most efficient data centers Cheyenne Climate Eliminate refrigerant based cooling for ~98% of the year Efficient water use Minimal transformations steps electrically Guaranteed 10% Renewable Option for 100% (Wind Power) Power Usage Effectiveness (PUE) PUE = Total Load / Computer Load = 1.08 ~ 1.10 for WY Good data centers 1.5 Typical Commercial 2.0
Expandable and Modular Long term 24 acre site Dual substation electrical feeds Utility commitment up to 36MW Substantial capability fiber optics Medium term Phase I / Phase II Can be doubled Enough structure for three or four generations Future container solutions or other advances Short term Module A Module B Each Module 12,000 sq.ft. Install 4.5MW initially can be doubled in Module A Upgrade Cooling and Electrical systems in 4.5MW increments
Project Timeline 65% Design Review Complete (October 2009) 95% Design Complete 100% Design Complete (Feb 2010) Anticipated Groundbreaking (June 2010) Certification of Occupancy (October 2011) Petascale System in Production (January 2012)
NCAR-Wyoming Supercomputer Center
HPC services at NWSC PetaFlop Computing 1 PFLOP peak minimum 100 s PetaByte Archive (+20 PB yearly growth) HPSS Tape archive hardware acquisition TBD PetaByte Shared File System(s) (5-15PB) GPFS/HPSS HSM integration (or Lustre) Data Analysis and Vis Cross Mounting to NCAR Research Labs, others Research Data Archive data servers and Data portals High-Speed Networking and some enterprise services
NWSC HPC Projection Peak PFLOPs at NCAR 2.0 1.8 NWSC HPC (uncertainty) NWSC HPC 1.6 1.4 IBM POWER6/Power5 (bluefire) 1.2 IBM POWER5+/p575/H (blueice) 1.0 IBM POWER5/p57 (bluevista) 0.8 IBM BlueGene/L 0.6 IBM Opteron/Linu (pegasus) 0.4 ICESS Phase 2 0.2 ARCS Phase 4 ICESS Phase 1 IBM Opteron/Linu (lightning) 0.0 bluesky Jan-04 Jan-05 Jan-06 Jan-07 Jan-08 Jan-09 Jan-10 Jan-11 Jan-12 Jan-13 Jan-14 23 bluefire frost IBM POWER4/Co (bluesky)
NCAR Data Archive Projection Total Data in the NCAR Archive (Actual and Projected) 120 Total Unique 100 80 60 40 20 0 Jan-99 Jan-00 Jan-01 Jan-02 Jan-03 Jan-04 Jan-05 Jan-06 Jan-07 Jan-08 Jan-09 Jan-10 Jan-11 Jan-12 Jan-13 Jan-14 Jan-15 Petabytes 24
Tentative Procurement Timeline NWSC Production Installation & ATP Award NSF Approval Negotiations Proposal Evaluation RFP Release Benchmark Refinement RFP Refinement RFP Draft RFI Evaluation RFI NDA's Jul-09 Jan-10 Jul-10 Dec-10 Jul-11 Jan-12 39993 40177 40361 40545 40729 40913
Front Range GigaPoP (FRGP) 10 years of operation by UCAR 15 members NLR, I2, Esnet peering @ 10Gbps Commodity Qwest and Level3 CPS and TransitRail Peering Intra-FRGP peering Akamai 10Gbps ESnet peering www.frgp.net
Fiber, fiber, fiber NCAR/UCAR intra-campus fiber 10Gbps backbone Boulder Research and Administration Network (BRAN) Bi-State Optical Network (BiSON) upgrading to 10/40/100Gbps capable Enables 10Gbps NCAR/UCAR Teragrid connection Enable 10Gbps NOAA/DC link via NLR lifetime lambda DREAM 10Gbps SCONE 1Gbps
Western Regional Network (WRN) A multi-state partnership to ensure robust, advanced, high-speed networking available for research, education, and related uses Increased aggregate bandwidth Decreased costs
Questions