Data Intensive Science Impact on Networks

Similar documents
Enhancing Infrastructure: Success Stories

Achieving the Science DMZ

Network Support for Data Intensive Science

Programmable Information Highway (with no Traffic Jams)

International Climate Network Working Group (ICNWG) Meeting

GÉANT Services Supporting International Networking and Collaboration

Energy-Efficient Data Transfers in Radio Astronomy with Software UDP RDMA Third Workshop on Innovating the Network for Data-Intensive Science, INDIS16

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

IRNC Kickoff Meeting

Grid Computing: dealing with GB/s dataflows

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM

T0-T1-T2 networking. Vancouver, 31 August 2009 LHCOPN T0-T1-T2 Working Group

Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns

The Science DMZ: Evolution

High Performance Computing Course Notes Grid Computing I

Big Data - Some Words BIG DATA 8/31/2017. Introduction

IEPSAS-Kosice: experiences in running LCG site

Storage on the Lunatic Fringe. Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium

Ronald van der Pol

Grid Computing: dealing with GB/s dataflows

Improving Network Infrastructure to Enable Large Scale Scientific Data Flows and Collaboration (Award # ) Klara Jelinkova Joseph Ghobrial

ICN for Cloud Networking. Lotfi Benmohamed Advanced Network Technologies Division NIST Information Technology Laboratory

Ronald van der Pol

Insight: that s for NSA Decision making: that s for Google, Facebook. so they find the best way to push out adds and products

DATA REVOLUTION: How changes in data collection and analysis are changing the world around us

Gigabyte Bandwidth Enables Global Co-Laboratories

Peer to Peer Computing

Overview. About CERN 2 / 11

The practical use of LED light controllers within machine vision systems. STEMMER IMAGING Technology Forum Munich, November 2013

Practical MU-MIMO User Selection on ac Commodity Networks

THOUGHTS ON SDN IN DATA INTENSIVE SCIENCE APPLICATIONS

Networking Strategy and Optimization Services (NSOS) 2010 IBM Corporation

The practical use of LED light controllers within machine vision systems. STEMMER IMAGING Technology Forums 2015

Connectivity Services, Autobahn and New Services

Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21)

1. What is a Computer Network? interconnected collection of autonomous computers connected by a communication technology

The IEEE 1588 Standard

Affordable and power efficient computing for high energy physics: CPU and FFT benchmarks of ARM processors

Internet data transfer record between CERN and California. Sylvain Ravot (Caltech) Paolo Moroni (CERN)

irods at TACC: Secure Infrastructure for Open Science Chris Jordan

The National Center for Genome Analysis Support as a Model Virtual Resource for Biologists

EPICS: Experimental Physics and Industrial Control System. Control Architecture Reading Group

Network Needs of US-China Fusion Research Collaborations

Improving Packet Processing Performance of a Memory- Bounded Application

Grid Computing a new tool for science

Cloudflare Advanced DDoS Protection

Government Transport Networks: Minimize Lifetime Costs

ESNET Requirements for Physics Reseirch at the SSCL

Standardization Trend for Super 3G (LTE)

The Future of Interconnect Technology

AmLight ExP & AtlanticWave-SDX: new projects supporting Future Internet Research between U.S. & South America

Data Transfers Between LHC Grid Sites Dorian Kcira

NBASE-T and Machine Vision: A Winning Combination for the Imaging Market

Presentation of the LHCONE Architecture document

Preparing for High-Luminosity LHC. Bob Jones CERN Bob.Jones <at> cern.ch

Atos announces the Bull sequana X1000 the first exascale-class supercomputer. Jakub Venc

Summary of the LHC Computing Review

Enabling a SuperFacility with Software Defined Networking

White Paper The Need for a High-Bandwidth Memory Architecture in Programmable Logic Devices

Background Brief. The need to foster the IXPs ecosystem in the Arab region

Network Design Considerations for Grid Computing

ACCELERATING RESEARCH AND EDUCATION NETWORKS WITH INFINERA CLOUD XPRESS

Virtual Circuits Landscape

A NetFlow/IPFIX implementation with OpenFlow

Security in Big and Open Data - WG. Alessandra Scicchitano GÉANT - Project Development Officer Ralph Niederberger Jülich Supercomputing Center (FZJ)

Matt Wakeley 27 June, iscsi Framing Presentation

The Internet and the World Wide Web

Engagement With Scientific Facilities

Moving e-infrastructure into a new era the FP7 challenge

Case Study: Using System Tracing to Improve Packet Forwarding Performance

GÉANT: Supporting R&E Collaboration

A PERSPECTIVE ON FACILITIES STEWARDSHIP

Dell EMC Storage with the Avigilon Control Center System

Succeeded: World's fastest 600Gbps per lambda optical. transmission with 587Gbps data transfer

High-Energy Physics Data-Storage Challenges

Adaptive selfcalibration for Allen Telescope Array imaging

A Planet of Smarter Cities: Security and critical infrastructures impact

Background Brief. The need to foster the IXPs ecosystem in the Arab region

Journey Towards Science DMZ. Suhaimi Napis Technical Advisory Committee (Research Computing) MYREN-X Universiti Putra Malaysia

Event-Synchronized Data Acquisition System of 5 Giga-bps Data Rate for User Experiment at the XFEL Facility, SACLA

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING

CSE 124: THE DATACENTER AS A COMPUTER. George Porter November 20 and 22, 2017

Modeling and Validating Time, Buffering, and Utilization of a Large-Scale, Real-Time Data Acquisition System

AmLight supports wide-area network demonstrations in Super Computing 2013 (SC13)

IMPROVES. Initial Investment is Low Compared to SoC Performance and Cost Benefits

High Speed Asynchronous Data Transfers on the Cray XT3

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Advancing European R&E through collaboration

Replicate It! Scalable Content Delivery: Why? Scalable Content Delivery: How? Scalable Content Delivery: How? Scalable Content Delivery: What?

Technology Watch. Data Communications 2 nd Quarter, 2012

Some Reflections on Advanced Geocomputations and the Data Deluge

Lecture 1 Overview - Data Communications, Data Networks, and the Internet

Chapter 7: Multimedia Networking

FIBER OPTIC NETWORK TECHNOLOGY FOR DISTRIBUTED LONG BASELINE RADIO TELESCOPES

D Demonstration of integrated BoD testing and validation

Data Intensive Scalable Computing

Extending InfiniBand Globally

Magellan Project. Jeff Broughton NERSC Systems Department Head October 7, 2009

CloudStorm TM 100GE Application and Security Test Load Module

The CMS Computing Model

Transcription:

Data Intensive Science Impact on Networks Eli Dart, Network Engineer ESnet Network Engineering g Group IEEE Bandwidth Assessment Ad Hoc December 13, 2011

Outline Data intensive science examples Collaboration and traffic profile Future landscape

Large agehadron ado Collider - ATLAS Large data sets (transfers of tens of terabytes are routine) Automated data distribution over multiple continents Large data rates ~1 Petabyte per second from the instrument Multi-stage trigger farm reduces this to ~200-400MB/sec Additional data from event reconstruction Large-scale distribution of data to international collaboration 10-40Gbps out to large repositories in Europe, North America, and Asia 5-10Gbps to analysis centers worldwide This will increase over time as the LHC is upgraded

Genomics Genome sequencing is in its infancy Already seeing significant increase in data rates Increases coming from two directions Per-instrument data rate increasing significantly (~10x over 5 years) Cost of sequencers plummeting (10x over 5 years) Human genome sequencing cost $10,500 in July 2011 from $8.9 million in July 2007 NYTimes Wide variety of applications for genomics data as science improves, applications discovered, etc.

Instruments ts Many instruments used in basic research are essentially high-resolution digital cameras The data rates from these instruments are increasing with the capabilities of the instruments Some instruments in development will be able to collect terabits per second of data There are not enough I/O pins on the chip to get all the data out On-chip data reduction will be necessary Transfer or streaming of data to computing resources will be necessary - ~2.5Gbps today, significant growth curve

Futures Square Kilometer Array Large radio telescope in the Southern Hemisphere Approximately one square kilometer of combined signal collection area ~2800 receivers in the telescope array Most receivers in a 180km diameter area Average 100km run to central correlator ~2 Petabytes per second at the central correlator Distribution of data to international collaborators Expected rate of ~100Gbps from correlator to analysis centers International collaboration wide data distribution There are others (Sensor networks, ITER, etc.) 12/13/2011 6

Collaboration at o Structures The very structure of modern science assumes there is a network interconnecting all parts of the collaboration Large, unique facilities (e.g. LHC) provide a focus for all members of a field Data distributed to scientists Results distributed ib t d among collaborators Data analysis using local resources also drives data movement Example large simulation run at supercomputer center, secondary analysis at home institution Large data sets + increasing scope of collaboration Scientific productivity gated on data analysis Data moved to analysis, analysis moved to data both must be supported in the general case 12/13/2011 7

The Science Data Growth 12/13/2011 8

Networks For Data Intensive e Science ce What does data intensive science traffic look like? For a given bandwidth, much larger per-flow rates, much smaller flow count Often a significant fraction (10-50%) of link bandwidth in a single flow (often over intercontinental distances) TCP is showing its limitations Loss sensitivity CPU load IMIX traffic profile is not a good approximation for science traffic 12/13/2011 9

Futures Data rates will continue to increase Sensor data rate scales with semiconductor capabilities (think digital cameras) Large facilities will fan out data to large collaborations Host architecture means new protocols are likely Per-flow packet processing (e.g. TCP) is essentially a serial task, and per-core clock rate is essentially flat Faced with exponential growth in data rates, what do we do? Different means of getting data into host memory (e.g. RDMA over Ethernet) are being tested today Again IMIX traffic is not a good model here

Impact on Networks Science networks will continue to see a different traffic profile Relatively small flow count, relatively large flow rate IMIX traffic assumptions won t hold Beware LAG-like implementations! Protocol space is likely to change Data mobility requirements will drive new modes of operation even if it s standard it probably won t be all TCP(Ethernet, or IP, or UDP, with something above for reliability) Services are important predictability and programmability are important (e.g. low loss, OpenFlow)

Questions? Thanks! Eli Dart - dart@es.net http://www.es.net/ http://fasterdata.es.net/