Network Support for Data Intensive Science

Similar documents
International Big Science Coming to Your Campus Soon (Sooner Than You Think )

Data Intensive Science Impact on Networks

Engagement With Scientific Facilities

Achieving the Science DMZ

Enhancing Infrastructure: Success Stories

RESEARCH DATA DEPOT AT PURDUE UNIVERSITY

International Climate Network Working Group (ICNWG) Meeting

The Blue Water s File/Archive System. Data Management Challenges Michelle Butler

Insight: that s for NSA Decision making: that s for Google, Facebook. so they find the best way to push out adds and products

RED HAT ENTERPRISE LINUX. STANDARDIZE & SAVE.

The Cambridge Bio-Medical-Cloud An OpenStack platform for medical analytics and biomedical research

Leveraging High Performance Computing Infrastructure for Trusted Digital Preservation

High Performance Computing from an EU perspective

The Science DMZ: Evolution

Smart Islands, Intelligent Nations. Carlyle Roberts, Bahamas Telecommunications Company August 1 st, 2016

I/O Challenges: Todays I/O Challenges for Big Data Analysis. Henry Newman CEO/CTO Instrumental, Inc. April 30, 2013

Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21)

e-infrastructures in FP7 INFO DAY - Paris

Imperial College London. Simon Burbidge 29 Sept 2016

Improving Network Infrastructure to Enable Large Scale Scientific Data Flows and Collaboration (Award # ) Klara Jelinkova Joseph Ghobrial

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM

Customer Success Story Los Alamos National Laboratory

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

PLAN-E Workshop Switzerland. Welcome! September 8, 2016

Trends in HPC (hardware complexity and software challenges)

The IRIS Data Management Center maintains the world s largest system for collecting, archiving and distributing freely available seismological data.

Moving e-infrastructure into a new era the FP7 challenge

Magellan Project. Jeff Broughton NERSC Systems Department Head October 7, 2009

HPC Saudi Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences. Presented to: March 14, 2017

Making the case for SD-WAN

HETEROGENEOUS COMPUTE INFRASTRUCTURE FOR SINGAPORE

escience in the Cloud Dan Fay Director Earth, Energy and Environment

Big Data 2015: Sponsor and Participants Research Event ""

Journey Towards Science DMZ. Suhaimi Napis Technical Advisory Committee (Research Computing) MYREN-X Universiti Putra Malaysia

High Performance Computing Course Notes Grid Computing I

CSD3 The Cambridge Service for Data Driven Discovery. A New National HPC Service for Data Intensive science

Discover the Hidden Cost Savings of Cloud Computing

2016 Survey MANAGING APPLE DEVICES IN HIGHER EDUCATION

GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations

ACCELERATING RESEARCH AND EDUCATION NETWORKS WITH INFINERA CLOUD XPRESS

BUSTED! 5 COMMON MYTHS OF MODERN INFRASTRUCTURE. These Common Misconceptions Could Be Holding You Back

Using LabVIEW in Instrumentation and Control Course

ALICE Grid Activities in US

High-Energy Physics Data-Storage Challenges

The National Fusion Collaboratory

Outline. Quick Introduction to Database Systems. Data Manipulation Tasks. What do they all have in common? CSE142 Wi03 G-1

Towards a Strategy for Data Sciences at UW

Accelerating Digital Transformation

A Better Approach to Leveraging an OpenStack Private Cloud. David Linthicum


PART I - Fundamentals of Parallel Computing

Introduction to the Internet Ecosystem and Its Governance

Voxel-based Registration Methods in in vivo Imaging

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

Case studies: How Office 365 can streamline IT processes

WVU RESEARCH COMPUTING INTRODUCTION. Introduction to WVU s Research Computing Services

Your CONNECTION to the CREDENTIALING COMMUNITY JOIN TODAY

The Computation and Data Needs of Canadian Astronomy

ICGI Recommendations for Federal Public Websites

The National Center for Genome Analysis Support as a Model Virtual Resource for Biologists

ESSENTIAL, QUALITY IT SUPPORT FOR SMALL AND MEDIUM BUSINESSES

4 KEY FACTORS FOR DATA QUALITY ON A DATA LAKE (OR: HOW TO AVOID THE DATA SWAMP) JOSH HERRITZ MIOSOFT CORPORATION MIOsoft Corporation.

The Pacific Research Platform (PRP)

Modernizing Healthcare IT for the Data-driven Cognitive Era Storage and Software-Defined Infrastructure

On-Demand Supercomputing Multiplies the Possibilities

Cluster-Level Google How we use Colossus to improve storage efficiency

SUCCESS STORY THE POLYCLINIC THE POLYCLINIC SPEEDS UP ITS VDI ENVIRONMENT WITH NVIDIA GRID

UCLA RESEARCH INFORMATICS STRATEGIC PLAN Taking Action June, 2013

The Data exacell DXC. J. Ray Scott DXC PI May 17, 2016

GRIDS INTRODUCTION TO GRID INFRASTRUCTURES. Fabrizio Gagliardi

Organizational Structure of the Toronto Environment Office

Great Plains Network. Kate Adams, GPN

Realtime Data Analytics at NERSC

Hva er den reelle business-verdien ved software definert datasenter (SDDC) Kjell-Einar Anderssen Country Manager - Norway

Taking Back Control of Your Network With SD-LAN

Rutgers Discovery Informatics Institute (RDI2)

e-infrastructure: objectives and strategy in FP7

Why Enterprises Need to Optimize Their Data Centers

irods at TACC: Secure Infrastructure for Open Science Chris Jordan

How to Cloud for Earth Scientists: An Introduction

WHY BUILDING SECURITY SYSTEMS NEED CONTINUOUS AVAILABILITY

Scientific Computing at SLAC

Stakeholders Analysis

A Ready Business rises above infrastructure limitations. Vodacom Power to you

Enabling Science Through Cyber Security At 100G

Data Intensive Computing SUBTITLE WITH TWO LINES OF TEXT IF NECESSARY PASIG June, 2009

Re-imagine the server. Think compute. Delivering the right compute, for the right workload, at the right economics every time

Chris Dwan - Bioteam

Part III: Evaluating the Business Value of the Hybrid Cloud

TRAFFIC STAT SETTINGS

FOUNDATIONS OF A CROSS-DISCIPLINARY PEDAGOGY FOR BIG DATA *

Helping State Government Agencies Deliver a Better Constituent Experience through Better Communications

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

Introduction to ARSC. David Newman (from Tom Logan slides), September Monday, September 14, 15

Forensic analysis with leading technology: the intelligent connection Fraud Investigation & Dispute Services

SLIDE 1 - COPYRIGHT 2015 ELEPHANT FLOWS IN THE ROOM: SCIENCEDMZ NATIONALLY DISTRIBUTED

Irish and European Grid Projects

ETP4HPC IN A NUTSHELL

CERN openlab II. CERN openlab and. Sverre Jarp CERN openlab CTO 16 September 2008

NERSC. National Energy Research Scientific Computing Center

Transcription:

Network Support for Data Intensive Science Eli Dart, Network Engineer ESnet Network Engineering Group ARN2 Workshop Washington, DC April 18, 2013

Overview Drivers Sociology Path Forward 4/19/13 2

Exponential Drivers We live in a world governed by exponentials Genomics data set growth (~5x/year) Moore s Law (~2x/18 months) Sensors/Detectors Computing Network devices There is an approximate balance of sorts Challenge: data growth Response: data transport, storage, analysis The balance isn t perfect If the response part flattens out, it s time to get worried 4/19/13 3

Non-Exponential Components Matter Several important parts of the science ecosystem are not exponential Money People (including policies, which are rules among people) Protocols Some of this is accounted for in the balance (e.g. money) People and protocols change very slowly We are currently in a place where the non-exponential components need to change, too 4/19/13 4

Paradigm Shift The transition to data intensive science is about more than data and science People need to change how they think Learning new things takes time This is true of science collaborations as well as the individual scientists A few are able to adapt on their own Most need to import expertise Patterns have emerged from contact with a wide variety of collaborations 4/19/13 5

Rough User Grouping By Data Set Size 100PB 10PB 1PB Small collaboration scale, e.g. light and neutron sources Medium collaboration scale, e.g. HPC codes A few large collaborations have internal software and networking organizations Data Scale 100TB 10TB Large collaboration scale, e.g. LHC 1TB 100GB 10GB Collaboration Scale 4/19/13 6

Rough User Grouping By Data Set Size 100PB 10PB 1PB Small collaboration scale, e.g. light and neutron sources Medium collaboration scale, e.g. HPC codes A few large collaborations have internal software and networking organizations Data Scale 100TB 10TB Large collaboration scale, e.g. LHC 1TB 100GB 10GB Collaboration Scale 4/19/13 7

Rough User Grouping Discussion (1) The chart is a crude generalization It is not meant to describe specific collaborations, but to illustrate some common aspects of many collaborations Data sets are constantly growing (the lines smear to the right) Small data instrument science Light sources, microscopy, nanoscience centers, etc. Typically small number of scientists per collaboration, many many collaborations Individual collaborations typically rely on site support and grad students This group typically has difficulty moving data via the network Science DMZs and Data Transfer Nodes (especially if deployed with Globus Online) are starting to help

Rough User Grouping Discussion (2) Supercomputer simulation science Climate, fusion, bioinformatics, computational astrophysics, etc. Larger collaborations, often multi-site Reliant on supercomputer center staff for help with network issues, or on grad students This group typically has difficulty transferring data via the network - Many users still want to use HPSS directly (often performs poorly) - Data Transfer Nodes are starting to help, especially when deployed with Globus Online Large data instrument science (HEP, NP) Very large collaborations multi-institution, multi-nation-state Collaborations have their own software and networking shops Typically able to use the network well, in some cases expert

Networks Depend On Others We all know that networking is end2end What this really means is that we are interdependent I can t succeed if you fail You can t succeed if I don t do my part Our users must succeed for us to be viewed as successful The Science DMZ model helps People are fixing the edges There is more to do We must proactively help our constituents succeed 4/19/13 10

Differentiation How does a science network differentiate itself? Make the impossible possible Make the difficult routine The commodity world is advancing relentlessly (see exponentials) However, the capability-class niche is probably not worth their time As long as the U.S. is interested in scientific leadership, science networks will be necessary I believe this dynamic holds both for regional and national networks We know our constituents We can innovate if we maintain the flexibility to do so We can build capability-class solutions for specific purposes They don t need to scale to 100M users If they scale to the right 10 experiments, there s a Nobel Prize However, this won t happen by itself we must shape our own destiny 4/19/13 11

Questions? Thanks! Eli Dart - dart@es.net http://www.es.net/ http://fasterdata.es.net/