Grid monitoring with MonAlisa

Similar documents
Internet2 Meeting September 2005

An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems. June 2007

MONitoring Agents using a Large Integrated Services Architecture. Iosif Legrand California Institute of Technology

Iosif Legrand. California Institute of Technology

Monitoring the ALICE Grid with MonALISA

Monitoring and Control of Large Scale Distributed Systems

MONALISA: A MONITORING FRAMEWORK FOR LARGE SCALE COMPUTING SYSTEMS

DIPLOMA PROJECT. Distributed Delay Constrained Multicast Algorithms for the MonALISA Framework

Grid Tutorial Networking

Measuring the available bandwidth in a network (end to end). Integration in MonALISA

Table of Contents. LISA User Guide

POLITEHNICA UNIVERSITY OF BUCHAREST THE FACULTY OF COMPUTER SCIENCE AND AUTOMATIC CONTROL DIPLOMA PROJECT DISTRIBUTED ALGORITHMS

Multinode Scalability and WAN Deployments

HA solution with PXC-5.7 with ProxySQL. Ramesh Sivaraman Krunal Bauskar

vrealize Operations Management Pack for NSX for vsphere Release Notes

Percona XtraDB Cluster ProxySQL. For your high availability and clustering needs

Understanding Feature and Network Services in Cisco Unified Serviceability

ANSE: Advanced Network Services for [LHC] Experiments

SIMULATION FRAMEWORK FOR MODELING LARGE-SCALE DISTRIBUTED SYSTEMS. Dobre Ciprian Mihai *, Cristea Valentin *, Iosif C. Legrand **

Policy Manager for IBM WebSphere DataPower 7.2: Configuration Guide

System Description. System Architecture. System Architecture, page 1 Deployment Environment, page 4

The ATLAS Software Installation System v2 Alessandro De Salvo Mayuko Kataoka, Arturo Sanchez Pineda,Yuri Smirnov CHEP 2015

DYNES: DYnamic NEtwork System

Cisco Configuration Engine 2.0

Global Platform for Rich Media Conferencing and Collaboration

MySQL High Availability. Michael Messina Senior Managing Consultant, Rolta-AdvizeX /

Session 7: Configuration Manager

Adaptive Cluster Computing using JavaSpaces

Presented By: Ian Kelley

Troubleshooting Cisco DCNM

Percona XtraDB Cluster

IBM WebSphere Application Server 8. Clustering Flexible Management

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems

vrealize Operations Management Pack for NSX for vsphere 3.5 Release Notes

Oracle WebLogic Server 12c: Administration I

Service Mesh and Microservices Networking

Coherence & WebLogic Server integration with Coherence (Active Cache)

WLS Neue Optionen braucht das Land

vrealize Operations Management Pack for NSX for vsphere 3.0

Cisco ISE Ports Reference

A Simulation Model for Large Scale Distributed Systems

Migration and Building of Data Centers in IBM SoftLayer

Microsoft Exchange Server 2013 and 2016 Deployment

Port Usage Information for the IM and Presence Service

Distributed Systems Principles and Paradigms. Chapter 12: Distributed Web-Based Systems

Matthias Wobben working in Berlin, Germany. Senior Sales Engineer at Nextcloud

Oracle 10g and IPv6 IPv6 Summit 11 December 2003

Cisco ISE Ports Reference

The Collaboration Cornerstone

EUROPEAN MIDDLEWARE INITIATIVE

CYAN SECURE WEB Installing on Windows

OpenFlow: What s it Good for?

Enable Remote Registry Modification Schema Master

Optimize and Accelerate Your Mission- Critical Applications across the WAN

Cisco ISE Ports Reference

Identity Firewall. About the Identity Firewall

AMGA metadata catalogue system

BlackBerry Enterprise Server for IBM Lotus Domino Version: 5.0. Administration Guide

1 Modular architecture

New Features for ASA Version 9.0(2)

Project Management. Projects CHAPTER

System Specification

Functional Requirements for Grid Oriented Optical Networks

Multi-threaded, discrete event simulation of distributed computing systems

Oracle Enterprise Manager. 1 Before You Install. System Monitoring Plug-in for Oracle Unified Directory User's Guide Release 1.0

New Product: Cisco Catalyst 2950 Series Fast Ethernet Desktop Switches

Port Usage Information for the IM and Presence Service

USING ARTIFACTORY TO MANAGE BINARIES ACROSS MULTI-SITE TOPOLOGIES

Service-Oriented Programming

Configuration Guide. BlackBerry UEM Cloud

Network Security and Topology

Chapter 18 Distributed Systems and Web Services

Monitoring of a distributed computing system: the Grid

Installing and Configuring vcloud Connector

Veeam Cloud Connect. Version 8.0. Administrator Guide

Product Name DCS v MozyPro v2.0 Summary Multi-platform server-client online (Internet / LAN) backup software with web management console

Cisco Wide Area Bonjour Solution Overview

CACHE ME IF YOU CAN! GETTING STARTED WITH AMAZON ELASTICACHE. AWS Charlotte Meetup / Charlotte Cloud Computing Meetup Bilal Soylu October 2013

Cisco Unified Contact Center Express in Hosted Collaboration Deployment

Service-Oriented Architecture for Command and Control Systems with Dynamic Reconfiguration

Implementation Guide - VPN Network with Static Routing

Delivers cost savings, high definition display, and supercharged sharing

Installing SmartSense on HDP

Cisco ISE Ports Reference

Realtime visitor analysis with Couchbase and Elasticsearch

Multi-Data Center MySQL with Continuent Tungsten

Cisco Wide Area Application Services: Secure, Scalable, and Simple Central Management

Integration of Network Services Interface version 2 with the JUNOS Space SDK

Rule based Forwarding (RBF): improving the Internet s flexibility and security. Lucian Popa, Ion Stoica, Sylvia Ratnasamy UC Berkeley Intel Labs

BlackBerry Enterprise Server for IBM Lotus Domino Version: 5.0. Feature and Technical Overview

Open Mic Webcast. Jumpstarting Audio- Video Deployments Tony Payne March 9, 2016

Cisco Unified Communications Manager TCP and UDP Port

A User-level Secure Grid File System

Cisco Unified Communications Manager TCP and UDP Port

OpenIAM Identity and Access Manager Technical Architecture Overview

BlackBerry Enterprise Server for Microsoft Exchange Version: 5.0. Feature and Technical Overview

Kaazing Gateway: An Open Source

Cisco Certified Network Associate ( )

ELFms industrialisation plans

UiB 1. april 04. Sun Microsystems

Transcription:

Grid monitoring with MonAlisa Iosif Legrand (Caltech) Iosif.Legrand@cern.ch Laura Duta (UPB) laura.duta@cs.pub.ro Alexandru Herisanu (UPB) aherisanu@cs.pub.ro

Agenda What is MonAlisa? Who uses it? A little bit of design Monitoring Traffic and Jobs Network Topology, Latency and Routers Managing Optical Switches and Paths The VRVS and EVO system Summary INFSO-RI-508833 University Politehnica of Bucharest 2

What is MonAlisa? The MonALISA system is designed as an ensemble of autonomous multi-threaded, self-describing agent-based subsystems which are registered as dynamic services, and are able to collaborate and cooperate in performing a wide range of monitoring tasks and to analyze and process this information in a distributed way to provide optimization decisions in large scale distributed applications.

What MonAlisa provides Reliable Registration and Discovery for Services and Applications. Monitoring all aspects of complex systems : System information for computer nodes and clusters Network information : WAN and LAN Monitoring the performance of Applications or services The End User Systems Can interact with any other services to provide in near real-time customized / filtered information based on monitoring data Secure, remote administration for services and applications Agents to supervise applications, to restart or reconfigure them, and to notify other services when certain conditions are detected. The MonALISA framework can be used to develop higher level decision ion services, implemented as a distributed network of communicating agents, to perform global optimization tasks. Powerful Graphical User Interfaces

MonAlisa communities Disun CMS-US sites CMS CDF D0 SAR ABILENE backbone GLORIAD STAR ALICE VRVS System RoEduNET backbone INTERNET2 PIPES OSG VRVS ABILENE It has been used for Demonstrations - at: - CMS-DC04 SC 2005 - Bandwith Challenge Record: 131 Gbps CHEP06 GRID3 Terena Netw 2005 ALICE ICFA Wrksh. 05 SC 2004

A little bit of design ML Client ML Repository

MonAlisa System Services Clients, HL services repositories Proxies MonALISA service Global Services or Clients Dynamic load balancing Scalability & Replication Security Distributed Information System. Fully Distributed Discovery Network of JINI-LUSs Dynamic - based on a lease Secure & Public Mechanism and REN

MonAlisa service and Data Handling Client (other service) Web client WEB Service WSDL SOAP Postgres DB MySQL Monitor Data Stores Data Cache Service & DB Registration Lookup Service Communications via the ML Proxy data Lookup Service Discovery Applications MonALISA Service Predicates & Agents Configuration Control (SSL) Client (other service) Java User defined loadable Modules to write /sent data MDS

Registration, Discovery and AAA MonALISA Service Trust keystore MonALISA Service Admin SSL connection MonALISA Service Trust keystore Registration (signed certificate) Lookup Service Lookup Service Discovery Services Proxy Multiplexer Services Proxy Multiplexer AAA services Client (other service) Client authentication Client (other service)

Monitoring traffic and jobs

Monitoring the execution of jobs SPLIT JOBS LIFELINES for JOBS Summit a Job Job1 Job Job DAG Job2 Job3 Job 31 Job 32

Monitoring SPEED MonAlisa Fast Data Transfer 17Gbps full duplex at SC2006

Monitoring Network Topology, Latency, Routers NETWORKS ROUTERS AS Real Time Topology Discovery & Display

On demand Optical Path

Monitoring and Controlling Optical Planes Controlling Port power monitoring Glimmerglass Switch Example

Agents to Create on Demand an Optical Path or Tree 2 Discovery & Secure Connection ML Demon ML Agent MonALISA 3 1 Runs a ML Demon >ml_pathml_path IP1 IP4 copy file IP4 Optical Switch Control & Monitor the switch ML Agent MonALISA Optical Switch Optical Switch ML Agent MonALISA The time to create a path on demand is less than 1s independent of the location and the number of connections 4 ML proxy services used in Agent Communication

Host Monitoring with UltraLight Many network problems are actually endhost problems: misconfigured or underpowered endsystems Host/System TCP Network Settings Device Information The Information LISA application (see Iosif s talk later) was designed to monitor the endhost and its view of the network. For SC 05 we developed a Perl script to gather the relevant host details related to network performance and integrated the script with ApMon (an API for MonALISA) to allow us to publish this data to a MonALISA repository. Information on the system information, TCP configuration and network device setup was gathered and accessible from one site. Future plans are to coordinate this with LISA and deploy this as part of OSG. The Tier-2 centers are a primary target.

Available Bandwidth Measurements Embedded Pathload module.

Enforces measurement fairness Coordination Service for Available Bandwidth Measurements Avoids multiple probes on shared network segments Dynamic configuration of measure-ment timing Logs events Provides service redundancy by using a master-slave model

Monitoring VRVS Reflectors and Communication Topology

Creating a Dynamic, Global, Minimum Spanning Tree to optimize the connectivity A weighted connected graph G = (V,E) with n vertices and m edges. The quality of connectivity between any two reflectors is measured every 2s. Building in near real time a minimumspanning tree T w( T ) = ( v, u ) T w(( v, u ))

Summary MonaLISA is a fully distributed service system with no single point of failure. It provides reliable registration and discovery of services and applications. MonALISA is interfaced with many monitoring tools and is capable to collect information from different applications It allows to analyze and process information locally, using Filters or Agents that are dynamically deployed to provide customized information to other services or clients or to trigger predefined actions. Can be used to control and monitor any other applications. Agents can be used to supervise applications, to restart or reconfigure them, and to notify other services when certain conditions are detected. Provides a secure administration interface which allows to remotely control (start / stop/ reconfigure / upgrade) distributed services or applications. The Agent system in the MonALISA framework can be used to develop higher level services, implemented as a distributed network of communicating agents, to perform global optimization tasks. It proved to be a stable and reliable distributed service system ~180 Sites running MonALISA