Towards a Strategy for Data Sciences at UW

Similar documents
CC-IN2P3: A High Performance Data Center for Research

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

powered by Cloudian and Veritas

ALICE Grid Activities in US

DESY at the LHC. Klaus Mőnig. On behalf of the ATLAS, CMS and the Grid/Tier2 communities

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN

Insight: that s for NSA Decision making: that s for Google, Facebook. so they find the best way to push out adds and products

Preparing for High-Luminosity LHC. Bob Jones CERN Bob.Jones <at> cern.ch

New strategies of the LHC experiments to meet the computing requirements of the HL-LHC era

CSD3 The Cambridge Service for Data Driven Discovery. A New National HPC Service for Data Intensive science

Opportunities A Realistic Study of Costs Associated

Compact Muon Solenoid: Cyberinfrastructure Solutions. Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005

ICN for Cloud Networking. Lotfi Benmohamed Advanced Network Technologies Division NIST Information Technology Laboratory

Scientific Computing at SLAC. Amber Boehnlein

Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns

The Cambridge Bio-Medical-Cloud An OpenStack platform for medical analytics and biomedical research

e BOOK Do you feel trapped by your database vendor? What you can do to take back control of your database (and its associated costs!

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

Data publication and discovery with Globus

DDN Annual High Performance Computing Trends Survey Reveals Rising Deployment of Flash Tiers & Private/Hybrid Clouds vs.

Grid Computing at the IIHE

Computing as a Service

M-WERC Overview. Alan Perlstein Executive Director and CEO Mid-West Energy Research Consortium

The Computation and Data Needs of Canadian Astronomy

A global technology leader approaching $42B in sales with 57,000 people, and customers in 160+ countries LENOVO. ALL RIGHTS RESERVED

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM

IEPSAS-Kosice: experiences in running LCG site

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING

THE NATIONAL DATA SERVICE(S) & NDS CONSORTIUM A Call to Action for Accelerating Discovery Through Data Services we can Build Ed Seidel

CSCS CERN videoconference CFD applications

Top Trends in DBMS & DW

The LHC Computing Grid. Slides mostly by: Dr Ian Bird LCG Project Leader 18 March 2008

2017 Resource Allocations Competition Results

The Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center

Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France

CLOUDS OF JINR, UNIVERSITY OF SOFIA AND INRNE JOIN TOGETHER

ECONOMICAL, STORAGE PURPOSE-BUILT FOR THE EMERGING DATA CENTERS. By George Crump

Users and utilization of CERIT-SC infrastructure

New Zealand Government IbM Infrastructure as a service

The LHC Computing Grid

Power of the Portfolio. Copyright 2012 EMC Corporation. All rights reserved.

Andrea Sciabà CERN, Switzerland

Big Data Analytics and the LHC

From Internet Data Centers to Data Centers in the Cloud

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

Rutgers Discovery Informatics Institute (RDI2)

Tackling tomorrow s computing challenges today at CERN. Maria Girone CERN openlab CTO

PLAYBOOK. How Do You Plan to Grow? Evaluating Your Critical Infrastructure Can Help Uncover the Right Strategy

ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development

Travelling securely on the Grid to the origin of the Universe

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

High-Energy Physics Data-Storage Challenges

Physics Computing at CERN. Helge Meinhard CERN, IT Department OpenLab Student Lecture 27 July 2010

Big Data 2015: Sponsor and Participants Research Event ""

Revolutionizing Open. Cecilia Carniel IBM Power Systems Scale Out sales

A short introduction to the Worldwide LHC Computing Grid. Maarten Litmaath (CERN)

VIRTUAL OBSERVATORY TECHNOLOGIES

1. Introduction. Outline

New Zealand Government IBM Infrastructure as a Service

See what s new: Data Domain Global Deduplication Array, DD Boost and more. Copyright 2010 EMC Corporation. All rights reserved.

NARCCAP: North American Regional Climate Change Assessment Program. Seth McGinnis, NCAR

Scheduling Computational and Storage Resources on the NRP

Storage Infrastructure Optimization (SIO)

Storage Resource Sharing with CASTOR.

Brand-New Vector Supercomputer

Investor day. November 17, IT business Laurent Vernerey Executive Vice President

RESEARCH DATA DEPOT AT PURDUE UNIVERSITY

Data Transfers Between LHC Grid Sites Dorian Kcira

Computing at Belle II

<Insert Picture Here> Introducing Oracle WebLogic Server on Oracle Database Appliance

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

Physics Computing at CERN. Helge Meinhard CERN, IT Department OpenLab Student Lecture 21 July 2011

International Journal of Computer Engineering and Applications,

VMware on IBM Cloud:

The Stampede is Coming: A New Petascale Resource for the Open Science Community

Storage on the Lunatic Fringe. Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium

The Blue Water s File/Archive System. Data Management Challenges Michelle Butler

THE CUSTOMER SITUATION. The Customer Background

Monitoring system for geographically distributed datacenters based on Openstack. Gioacchino Vino

Organizational Update: December 2015

Visita delegazione ditte italiane

IBM Storwize V7000 TCO White Paper:

Overview of XSEDE for HPC Users Victor Hazlewood XSEDE Deputy Director of Operations

Oracle Exadata: The World s Fastest Database Machine

Austrian Federated WLCG Tier-2

CERN and Scientific Computing

Computing / The DESY Grid Center

VMWARE EBOOK. Easily Deployed Software-Defined Storage: A Customer Love Story

Computing for LHC in Germany

Magellan Project. Jeff Broughton NERSC Systems Department Head October 7, 2009

HIGH-PERFORMANCE STORAGE FOR DISCOVERY THAT SOARS

High Performance Computing from an EU perspective

Low-cost BYO Mass Storage Project

Intro to Software as a Service (SaaS) and Cloud Computing

5 Fundamental Strategies for Building a Data-centered Data Center

REFERENCE ARCHITECTURE Quantum StorNext and Cloudian HyperStore

PROOF-Condor integration for ATLAS

CANARIE Mandate Renewal Proposal

Transcription:

Towards a Strategy for Data Sciences at UW Albrecht Karle Department of Physics June 2017 High performance compu0ng infrastructure: Perspec0ves from Physics Exis0ng infrastructure and projected future needs Other areas: computa0onal astrophysics, quantum compu0ng Cases for coordina0on: Infrastructure Data management Machine learning

Computing in large physics Big Science projects: LHC experiments and IceCube IceCube at the South Pole High Luminosity Large Hadron Collider Experiments: CMS and ATLAS

HL-LHC Challenge: Data! Challenge Increasing data volume (x10) Increasing event complexity Increasing analysis sophistication Multivariate analysis techniques require more compute resources Moore s law topping out Budget limitations R&D and Planning Beginning stages opportunity UW has tradition and expertise Wisconsin should make tackling HLLHC computing challenge part of its research program <µ>=200 3

CMS Computing Systems Tier US Total T1 1 7 Raw Data Acquisition and Prompt Processing at Tier-0 T2 7 52 T3 30 63 Analysis takes places primarily at Tier-2 and Tier-3 6/21/17 Reconstruction takes place primarily at Tier-1 Sridhara Dasu (Wisconsin) Current Scale: 500K analysis jobs concurrently running 500PB storage in use world-wide 6

Wisconsin CMS Tier-2 Facility Needs to scale up by an order of magnitude for HL-LHC! 5

WIPAC (IceCube) Computing Facilities UW has been lead institution for the IceCube construction and operation funded by largely NSF UW s WIPAC is the host institution and perfoms primary data processing and data management for the IceCube collaboration. Overall infrastructure deployed across the various loca0ons: - GPU cluster (~450 GPUs) ~ 40% of power budget - CPU cluster (~7600 cores) ~ 30 % of power budget - Disk storage (~7 PB) ~ 20% of power budget - Infrastructure ~ 10% of power budget Power/cooling/space are currently the limi0ng factor. Distributed compu/ng: a key component in IceCube data analysis - About 50% of the cpu/gpu for IceCube is at UW-Madison, the rest is distributed worldwide. - Acknowledge the advantage of having transparent access to OSG, XSEDE, etc using HTCondor related technologies.

WIPAC (IceCube) Computing Facilities - 222 W Washington Av. datacenter Core servers and data storage ~55 kw IT - at the limit of space and cooling - Chamberlin Hall datacenters 3rd Floor (3216) 13 racks, ~115 kw IT capacity. Services: CPU/GPU cluster, data storage. 4th Floor (4103) South Pole Test System <10 kw IT - inefficient room cooling - WID/MIR - GZK9000 GPU cluster 1st deployed in 2012, full upgrade in 2016, 1 rack, ~15kW IT, shared with CHTC @ 30/70% - Near Future (6 months - 1 year): star0ng design for new WIPAC datacenter space at CS 2360 Goal: move the compu0ng @ 222 into CS 2360, Aim for ~100 kw IT total at the new loca0on.

Physics all: Computing requirements overview Current power needs IceCube ~ 200 kw HEP ~ 160 kw Plasma ~ 5 kw Department ~ 2 kw Total es0mates with room for future expansion: PODs Space (sqrft) Power (kw) Water (gpm) 6-8 3200-4300 900-1200 540-720 5-10 years forward view: larger share of compute could take place in the cloud, but Data I/O intensive workloads s0ll have too high associated costs in the cloud Data storage & repositories s0ll need to be hosted locally

Computational Astrophysics Seen as an area of significant growth to enable new science: - LSST (Large Synoptic Sky Survey), a very big new telescope with high data volume (15 TB/night) Keith Bechtol, faculty member starting next year - Plasma astrophysics

Quantum Computing Field of intense research and rapid development. Still in early stages it is based on alternate concepts of computing based on quantum mechanical phenomena. Applications: Machine learning Big data Pattern recognition (principal component analysis) Drug discovery cybersecurity/quantum internet/ factoring to break RSA encryption Quantum Chemistry Quantum materials simulation and development... Players: University & government labs Fortune 500 companies: IBM, Google, Intel, Microsoft, Lockheed, Smaller companies/startups Dwave, IonQ, Rigetti, ~ 5 M$/year in federally funded research: Quantum dots Superconductors Neutral atoms

Quantum Computing Investments Government Investments:

Infrastructure Physics is hos0ng some of the largest data processing/analysis infrastructures in campus We would benefit from par0cipa0ng in any data-infrastructure common infrastructure related discussions on campus. - Datacenters - Networking: roadmap for campus LAN/WAN evolu0on - Long term archive services -...

Data Science Areas Data Management: Effort to develop and maintain consistent data management plans - IceCube total data output ~ 700-1000 TB per year - Need to host our own mul0-petabyte repository. - Leveraging large tape facili0es at collabora0ng sites to provide the long term archive and preserva0on strategies (NERSC in the US and DESY-Zeuthen in Germany) - Currently working in a data catalog project to ease discoverability of data products - Would benefit from par0cipa0ng in a campus forum where similar experiences from other groups are shared, best prac0ces, tools, etc Machine Learning - A growing area in IceCube - larger number of analysis exploring the capabili0es of new deep neural networks tools for event selec0on and reconstruc0on. - Having access to a focused mul0disciplinary expert group in campus could be beneficial.

Conclusion Possible cases for coordina0on towards a strategy in data sciences: Infrastructure Data management Machine learning...