The perfsonar Project at 10 Years: Status and Trajectory

Similar documents
perfsonar Deployment on ESnet

Network and Host Design to Facilitate High Performance Data Transfer

A Brief Overview of the Science DMZ

Achieving the Science DMZ

Introduction to perfsonar. RIPE SEE5, Tirana, Albania Szymon Trocha Poznań Supercomputing and Networking Center, Poland April 2016

Introduction to. Network Startup Resource Center. Partially adopted from materials by

WLCG Network Throughput WG

perfsonar Going Forward Eric Boyd, Internet2 Internet2 Technology Exchange September 27 th 2016

Developing Networking and Human Expertise in Support of International Science

HIDDEN SLIDE Summary These slides are meant to be used as is to give an upper level view of perfsonar for an audience that is not familiar with the

Installation & Basic Configuration

DICE Diagnostic Service

International Climate Network Working Group (ICNWG) Meeting

The new perfsonar: a global tool for global network monitoring

perfsonar psui in a multi-domain federated environment

perfsonar Update Jason Zurawski Internet2 March 5, 2009 The 27th APAN Meeting, Kaohsiung, Taiwan

DICE Network Diagnostic Services

Enhancing Infrastructure: Success Stories

perfsonar: A Look Ahead Andrew Lake, ESnet Mark Feit, Internet2 October 16, 2017

perfsonar ESCC Indianapolis IN

The Science DMZ Design Pattern

Embedded Network Systems. Internet2 Technology Exchange 2018 October, 2018 Eric Boyd Ed Colone

Connectivity Services, Autobahn and New Services

SLIDE 1 - COPYRIGHT 2015 ELEPHANT FLOWS IN THE ROOM: SCIENCEDMZ NATIONALLY DISTRIBUTED

Connect. Communicate. Collaborate. Click to edit Master title style. Using the perfsonar Visualisation Tools

Science DMZ Architecture

SaaS Providers. ThousandEyes for. Summary

Fermilab WAN Performance Analysis Methodology. Wenji Wu, Phil DeMar, Matt Crawford ESCC/Internet2 Joint Techs July 23, 2008

Internet2 Technology Update. Eric Boyd Deputy Technology Officer

CHAPTER 3 GRID MONITORING AND RESOURCE SELECTION

Network Management & Monitoring

Building high-performing infrastructures for research Workshop

Lab 1: Improving performance by LAN Hardware Upgrade

please study up before presenting

Philippe Laurens, Michigan State University, for USATLAS. Atlas Great Lakes Tier 2 collocated at MSU and the University of Michigan

The Abilene Observatory and Measurement Opportunities

NetAlly. Application Advisor. Distributed Sites and Applications. Monitor and troubleshoot end user application experience.

Event-Based Software-Defined Networking: Build a Secure Science DMZ

Deployment of a WLCG network monitoring infrastructure based on the perfsonar-ps technology

Programmable Information Highway (with no Traffic Jams)

ThousandEyes for. Application Delivery White Paper

Engagement With Scientific Facilities

Cisco ISR G2 Management Overview

Improving Network Infrastructure to Enable Large Scale Scientific Data Flows and Collaboration (Award # ) Klara Jelinkova Joseph Ghobrial

REANNZ THE NREN FOR NEW ZEALAND RICHARD TUMALIUAN NETWORK ENGINEER TEIN4 NOC ANNUAL CONFERENCE 2015

CCIE SP Operations Written Exam v1.0

Chapter 13 TRANSPORT. Mobile Computing Winter 2005 / Overview. TCP Overview. TCP slow-start. Motivation Simple analysis Various TCP mechanisms

Multi-class Applications for Parallel Usage of a Guaranteed Rate and a Scavenger Service

Growth. Individual departments in a university buy LANs for their own machines and eventually want to interconnect with other campus LANs.

International Big Science Coming to Your Campus Soon (Sooner Than You Think )

Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

Experiments on TCP Re-Ordering March 27 th 2017

Yealink VCS Network Deployment Solution

Research and Education Networking Ecosystem and NSRC Assistance

Georgia State University Cyberinfrastructure Plan

HyperIP : SRDF Application Note

Enabling a SuperFacility with Software Defined Networking

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

How PERTs Can Help With Network Performance Issues

perfsonar Host Hardware

The Science DMZ as a Security Architecture

ClientVantage Agentless What s New in Release 11.1

Maximizing visibility for your

Chapter 8. Network Troubleshooting. Part II

GÉANT Open Service Description. High Performance Interconnectivity to Support Advanced Research

Performance Update. Jeff Boote Senior Network Software Engineer Internet2 Martin Swany Assistant Professor University of Delaware

October 5 th 2010 NANOG 50 Jason Zurawski Internet2

TCP and BBR. Geoff Huston APNIC

perfsonar: Be-er knowledge, be-er networks, be-er investments

GÉANT Open Service Description. High Performance Interconnectivity to Support Advanced Research

Handling Topology Updates in a Dynamic Tool for Support of Bandwidth on Demand Service

NSF Campus Cyberinfrastructure and Cybersecurity for Cyberinfrastructure PI Workshop

Click About to Korneel edit Master title style. Sr Customer Engineering Architect Korneel Bullens Microsoft

Building a Digital Bridge from Ohio

Network Debugging Strategies

Network Design Considerations for Grid Computing

Deploying a Statewide Performance Measurement Infrastructure in North Carolina

Ethernet Operation Administration and Maintenance Deliverable 2010

Ashortage of IPv4 address space has

Understanding Your Network Using NetSage

CS419: Computer Networks. Lecture 10, Part 3: Apr 13, 2005 Transport: TCP performance

Intercontinental Multi-Domain Monitoring for LHC with perfsonar

PLEASE READ CAREFULLY BEFORE YOU START

Networking for Data Acquisition Systems. Fabrice Le Goff - 14/02/ ISOTDAQ

EGEE (JRA4) Loukik Kudarimoti DANTE. RIPE 51, Amsterdam, October 12 th, 2005 Enabling Grids for E-sciencE.

PLEASE READ CAREFULLY BEFORE YOU START

A High-Performance Storage and Ultra- High-Speed File Transfer Solution for Collaborative Life Sciences Research

15-441: Computer Networking. Wireless Networking

On Network Dimensioning Approach for the Internet

MaDDash: Monitoring and Debugging Dashboard

EqualLogic Storage and Non-Stacking Switches. Sizing and Configuration

CS 326: Operating Systems. Networking. Lecture 17

The Best Defense is a Good Offense Creating Networks That Work (the first time)

CCNA Exploration Network Fundamentals. Chapter 06 Addressing the Network IPv4

Chapter 4: network layer. Network service model. Two key network-layer functions. Network layer. Input port functions. Router architecture overview

The European DataGRID Production Testbed

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup

Service Definition Internet Service

Correlating Internet Performance Changes and Route Changes to Assist in Trouble-shooting from an End-user Perspective

Hybrid Control and Switched Systems. Lecture #17 Hybrid Systems Modeling of Communication Networks

Transcription:

With contributions from S. Balasubramanian, G. Bell, E. Dart, M. Hester, B. Johnston, A. Lake, E. Pouyoul, L. Rotman, B. Tierney and others @ ESnet The perfsonar Project at 10 Years: Status and Trajectory Jason Zurawski - ESnet Engineering & Outreach engage@es.net GN3 (GÉANT) NA3, Task 2 - Campus Network Monitoring and Security Workshop April 24 th 2014

Overview Introduction The Ghost of perfsonar Past perfsonar Present Use Cases Future Directions & Unfinished Business 2 ESnet Science Engagement (engage@es.net) - 4/21/14

Overarching Motivation Networks are an essential part of data-intensive science Connect data sources to data analysis Connect collaborators to each other Enable machine-consumable interfaces to data and analysis resources (e.g. portals), automation, scale Performance is critical Exponential data growth Constant human factors Data movement and data analysis must keep up Effective use of wide area (long-haul) networks by scientists has historically been difficult 3 ESnet Science Engagement (engage@es.net) - 4/21/14

ESnet Supports DOE Office of Science Universities DOE laboratories The Office of Science supports: 27,000 Ph.D.s, graduate students, undergraduates, engineers, and technicians 26,000 users of open-access facilities 300 leading academic institutions 17 DOE laboratories 4 ESnet Science Engagement (engage@es.net) - 4/21/14

ESnet s Vision Scientific progress is completely unconstrained by the physical location of instruments, people, computational resources, or data. 5 ESnet Science Engagement (engage@es.net) - 4/21/14

The Central Role of the Network The very structure of modern science assumes science networks exist: high performance, feature rich, global scope What is The Network anyway? The Network is the set of devices and applications involved in the use of a remote resource - This is not about supercomputer interconnects - This is about data flow from experiment to analysis, between facilities, etc. User interfaces for The Network portal, data transfer tool, workflow engine Therefore, servers and applications must also be considered What is important? 1. Correctness 2. Consistency 3. Performance 6 ESnet Science Engagement (engage@es.net) - 4/21/14

Sample Data Transfer Rates This table available at:" http://fasterdata.es.net/fasterdata-home/requirements-and-expectations/" " 7 ESnet Science Engagement (engage@es.net) - 4/21/14 7

TCP Ubiquitous and Fragile Networks provide connectivity between hosts how do hosts see the network? From an application s perspective, the interface to the other end is a socket Communication is between applications mostly over TCP 2013 icanhascheezburger.com TCP the fragile workhorse TCP is (for very good reasons) timid packet loss is interpreted as congestion Packet loss in conjunction with latency is a performance killer Like it or not, TCP is used for the vast majority of data transfer applications (more than 95% of ESnet traffic is TCP) 8 ESnet Science Engagement (engage@es.net) - 4/21/14

A small amount of packet loss makes a huge difference in TCP performance Local (LAN) Metro Area With loss, high performance beyond metro distances is essentially impossible Regional Continental International Measured (TCP Reno) Measured (HTCP) Theoretical (TCP Reno) Measured (no loss) 4/21/14 9 ESnet Science Engagement (engage@es.net) - 4/21/14

Where Are The Problems? Source Campus S Congested or faulty links between domains Backbone Latency dependant problems inside domains with small RTT Destination Campus D Congested intracampus links NREN Regional 10 ESnet Science Engagement (engage@es.net) - 4/21/14

Local Testing Will Not Find Everything Performance is poor when RTT exceeds ~10 ms Performance is good when RTT is < ~10 ms Source Campus R&E Backbone Destination Campus S D Regional Regional Switch with small buffers 11 ESnet Science Engagement (engage@es.net) - 4/21/14

Soft Network Failures Soft failures are where basic connectivity functions, but high performance is not possible. TCP was intentionally designed to hide all transmission errors from the user: As long as the TCPs continue to function properly and the internet system does not become completely partitioned, no transmission errors will affect the users. (From IEN 129, RFC 716) Some soft failures only affect high bandwidth long RTT flows. Hard failures are easy to detect & fix soft failures can lie hidden for years! One network problem can often mask others 12 ESnet Science Engagement (engage@es.net) - 4/21/14

The Metrics We Care about Use the correct tool for the Job To determine the correct tool, maybe we need to start with what we want to accomplish What do we care about measuring? Packet Loss, DuplicaAon, out- of- orderness (transport layer) Achievable Bandwidth (e.g. Throughput ) Latency (Round Trip and One Way) JiMer (Delay variaaon) Interface UAlizaAon/Discards/Errors (network layer) Traveled Route MTU Feedback 13 ESnet Science Engagement (engage@es.net) - 4/21/14

Network Monitoring All networks do some form monitoring. Addresses needs of local staff for understanding state of the network o Would this information be useful to external users? o Can these tools function on a multi-domain basis? Beyond passive methods, there are active tools. o E.g. often we want a throughput number. Can we automate that idea? o Wouldn t it be nice to get some sort of plot of performance over the course of a day? Week? Year? Multiple endpoints? Where is the Measurement Middleware? Something to allow for the easy exchange of metrics that are collected locally, on a global scale? 14 ESnet Science Engagement (engage@es.net) - 4/21/14

Overview Introduction The Ghost of perfsonar Past perfsonar Present Use Cases Future Directions & Unfinished Business 15 ESnet Science Engagement (engage@es.net) - 4/21/14

The Ghost of perfsonar Past Internet2 Performance Evaluation and Review Framework (PERF) - ~2002 End-to-End Path Router Router Regularly Scheduled Tests On-Demand Tests Test Request Laptop computer Server Result Request Test Results Test Results Database of Performance Results Test Results Server 16 ESnet Science Engagement (engage@es.net) - 4/21/14

The Ghost of perfsonar Past Internet2 Performance Evaluation and Review Framework (PERF) - ~2002 Regularly Scheduled Tests On-Demand Tests Result Collection Backbone Node Network Backbone Test Data End Node Network Backbone Regional Node Backbone Node Backbone Node Regional Node End Node End Node Regional Test Data Application Domain Test Data 17 ESnet Science Engagement (engage@es.net) - 4/21/14

The Ghost of perfsonar Past Internet2 E2E pipes (End-to-End Performance Initiatives Performance Environment System) ~2003/2004 18 ESnet Science Engagement (engage@es.net) - 4/21/14

Courtesy of E. Boyd @ Internet2 Problem: Nobody s Fault Applications Developer Hey, this is not working right! LAN Administrator Not our problem Talk to the other guys Others are getting in ok LAN Administrator Applications Developer System Administrator The computer Is working OK Campus Networking Gigapop How do you solve a problem along a path? Everyone says it is working fine! Everything is AOK No other complaints Backbone Campus Networking Gigapop All the lights are green We don t see anything wrong The network is lightly loaded System Administrator Looks fine

The Ghost of perfsonar Past GEANT2/JRA1 Framework Layers (~2004) User Interface User Interface 1 User Interface 2 Domain Controllers Domain Controller - domain A Domain Controller - domain B Domain Controller - domain C Measurement Points Domain A Domain B Domain C Available Bandwidth Measurement Points 20 ESnet Science Engagement (engage@es.net) - 4/21/14 One-way delay Measurement Points type 1 One-way delay Measurement Points type 2

The Ghost of perfsonar Past Global Grid Forum (~2003) Hierarchy of Network Performance Characteristics http://www.ogf.org/ documents/gfd.23.pdf Request Schema Requirements and Sample Implementation Report Schema Requirements and Sample Implementation 21 ESnet Science Engagement (engage@es.net) - 4/21/14

The Ghost of perfsonar Past ~2004/2005 Merging the efforts of GGF/OGF, Internet2 E2E pipes, GN2/ JRA1 SONAR : You could call it pipes v2.0 (if you re American) or GFD (if you re European) A Services-Based Measurement Framework for Building Dynamic, Self-Organizing Performance Communities Added additional partners from ESnet and RNP First release in ~2005 ps/mdm Offerings in ~2007 ps Performance Toolkit ~2009 ~1000 Deployed Instances ~2013 22 ESnet Science Engagement (engage@es.net) - 4/21/14

Overview Introduction The Ghost of perfsonar Past perfsonar Present Use Cases Future Directions & Unfinished Business 23 ESnet Science Engagement (engage@es.net) - 4/21/14

What is perfsonar? perfsonar is a tool to: Set network performance expectations Find network problems ( soft failures ) Help fix these problems All in multi-domain environments These problems are all harder when multiple networks are involved perfsonar is provides a standard way to publish active and passive monitoring data This data is interesting to network researchers as well as network operators 24 ESnet Science Engagement (engage@es.net) - 4/21/14

perfsonar Toolkit The perfsonar Toolkit is an open source implementation and packaging of the perfsonar measurement infrastructure and protocols from ESnet and Internet2 http://psps.perfsonar.net All components are available as RPMs, and bundled into a CentOS 6- based netinstall and a Live CD perfsonar tools are much more accurate if run on a dedicated perfsonar host, not on the DTN Very easy to install and configure Usually takes less than 30 minutes 25 ESnet Science Engagement (engage@es.net) - 4/21/14

perfsonar Toolkit Services PS-Toolkit includes these measurement tools: BWCTL: network throughput OWAMP: network loss, delay, and jitter traceroute Test scheduler: runs bwctl, traceroute, and owamp tests on a regular interval Measurement Archives (data publication) SNMP MA router interface Data psb MA -- results of bwctl, owamp, and traceroute tests Lookup Service: used to find services PS-Toolkit includes these web100-based Troubleshooting Tools NDT (TCP analysis, duplex mismatch, etc.) NPAD (TCP analysis, router queuing analysis, etc) 26 ESnet Science Engagement (engage@es.net) - 4/21/14

perfsonar Present 27 ESnet Science Engagement (engage@es.net) - 4/21/14

Lookup Service Directory Search: http://stats.es.net/servicesdirectory/ 28 ESnet Science Engagement (engage@es.net) - 4/21/14

perfsonar Dashboard: http://ps-dashboard.es.net 29 ESnet Science Engagement (engage@es.net) - 4/21/14

Overview Introduction The Ghost of perfsonar Past perfsonar Present Use Cases Future Directions & Unfinished Business 30 ESnet Science Engagement (engage@es.net) - 4/21/14

Congestion via OWAMP Loss 31 ESnet Science Engagement (engage@es.net) - 4/21/14

Soft Routing/Interface Failures Rebooted router with full route table normal performance Gradual failure of optical line card Gb/s degrading performance repair one month 32 ESnet Science Engagement (engage@es.net) - 4/21/14

Failing Optics (BWCTL and Utilization) Example taken from Internet2 backbone 33 ESnet Science Engagement (engage@es.net) - 4/21/14

Firewalls When used as a comparison tool we can see that security devices are impacting performance Powerful incentive to fix this in the scientific path 34 ESnet Science Engagement (engage@es.net) - 4/21/14

Host Tuning Simple example play with the settings in /etc/sysctl.conf when running some BWCTL tests. See if you can pick out when we raised the memory for the TCP window 35 ESnet Science Engagement (engage@es.net) - 4/21/14

Overview Introduction The Ghost of perfsonar Past perfsonar Present Use Cases Future Directions & Unfinished Business 36 ESnet Science Engagement (engage@es.net) - 4/21/14

Future Directions & Unfinished Business Deployment Installation via ISO/Live CD was the greatest catalyst for deployment. This brought measurement tools to the unsophisticated user What does perfsonar on a small appliance (cubox, Intel NUC, Raspberry PI) look like, and what do we need to do to support it? Tools There will always be new tools the data abstraction supports sharing SDN integration? How can monitoring data be used to fundamentally change networking protocols? Use Cases Designed by engineers for engineers except when the Scientists and CEOs discovered what it can do Can a simple web page be made that allows a live tests between anything in the world? Could we get the same software used on the server, integrated into the routing device? Adoption Scaling to 1000+ instances was great how can we scale to 10000? Greater emphasis on the project means more helpers needed 37 ESnet Science Engagement (engage@es.net) - 4/21/14

The perfsonar Project at 10 Years: Status and Trajectory Questions/Comments/Criticisms? Jason Zurawski - zurawski@es.net ESnet Science Engagement engage@es.net http://psps.perfsonar.net