Monitoring for IT Services and WLCG. Alberto AIMAR CERN-IT for the MONIT Team

Size: px
Start display at page:

Download "Monitoring for IT Services and WLCG. Alberto AIMAR CERN-IT for the MONIT Team"

Transcription

1

2 Monitoring for IT Services and WLCG Alberto AIMAR CERN-IT for the MONIT Team 2

3 Outline Scope and Mandate Architecture and Data Flow Technologies and Usage WLCG Monitoring IT DC and Services Monitoring 3

4 Scope and Mandate 4

5 Monitoring - Scope Data Centres Monitoring Monitoring of DCs at CERN and Wigner Hardware, operating system, and services Data Centres equipment, PDUs, temperature sensors, etc. Metrics and logs Experiment Dashboards WLCG Monitoring Sites availability, data transfers, job information, reports Used by WLCG, experiments, sites and users 3

6 Services Proposed Cover IT DC, Services and WLCG Monitor, collect, visualize, process, aggregate, alarm Metrics and Logs Infrastructure operations and scale Helping and supporting Interfacing new data sources Developing custom processing, aggregations, alarms Building dashboards and reports 6

7 Data Centres Monitoring (meter) 7

8 Experiment Dashboards Job monitoring, sites availability, data management and transfers Used by experiments operation teams, sites, users, WLCG 8

9 WLCG Monitoring - Mandate Regroup monitoring activities hosted by CERN IT Monitoring of Data Centres, WLCG and Experiment Dashboards ETF, HammerCloud testing frameworks Uniform with standard CERN IT practices Management of services, communication, tools Review existing monitoring usage and needs (IT, WLCG, etc.) Investigate, implement established open source technologies Reduce dependencies on in-house software and on few experts Continue support, while preparing the new services 9

10 Architecture and Data Flow 10

11 Infrastructure Monitoring Job Monitoring Data mgmt and transfers Data Centres Monitoring Previous Monitoring Data Sources Metrics Manager Lemon Agent XSLS ATLAS Rucio FTS Servers DPM Servers XROOTD Servers CRAB2 CRAB3 WM Agent Farmout Grid Control CMS Connect PANDA WMS ProdSys Nagios VOFeed OIM GOCDB REBUS Transport Flume AMQ Kafka Flume AMQ GLED HTTP Collector z SQL Collector MonaLISA Collector AMQ HTTP GET HTTP PUT Storage &Search HDFS ElasticSearch Oracle ElasticSearch HDFS Oracle ElasticSearch Oracle ElasticSearch Processing & Aggregation Spark Hadoop Jobs GNI Oracle PL/SQL ESPER Spark Oracle PL/SQL ES Queries ESPER Display Access Kibana Jupyter Zeppelin Dashboards (ED) Kibana Zeppelin Real Time (ED) Accounting (ED) API (ED) SSB (ED) SAM3 (ED) API (ED)

12 Unified Monitoring Data Sources Transport Storage &Search Metrics Manager Lemon Agent XSLS ATLAS Rucio FTS Servers DPM Servers XROOTD Servers CRAB2 CRAB3 WM Agent Farmout Grid Control Flume AMQ z Kafka Hadoop HDFS ElasticSearch InfluxDB Processing & Aggregation CMS Connect PANDA WMS ProdSys Nagios VOFeed OIM GOCDB Spark Hadoop Jobs GNI REBUS Data Access Kibana Grafana Jupyter (Swan) Zeppelin

13 Flume Kafka sink Flume sinks Data Sources FTS Rucio XRootD Jobs User Data Lemon syslog app log AMQ DB HTTP feed Logs Lemon metrics Flume AMQ Flume DB Flume HTTP Flume Log GW Flume Metric GW Unified Monitoring Architecture Transport Kafka cluster (buffering) * Processing Data enrichment Data aggregation Batch Processing User Jobs Storage & Search HDFS Elastic Search Others (influxdb) Data Access CLI, API User Views ~100 data producers 3.5 TB/day 75k docs/sec 3 days retention in Kafka 13 spark jobs 24/7 13

14 Data Sources FTS Rucio XRootD Jobs User Data Lemon syslog app log collectd LogStash Unified Data Sources AMQ DB HTTP feed Logs Lemon metrics Metrics (http) Flume AMQ Flume DB Flume HTTP Flume Log GW Flume Lemon Metric GW Flume Metric GW Transport Flume Kafka sink Data is all channeled via Flume via gateways Validated and normalized if necessary (e.g. standard names, date formats) Adding new Data Sources is documented and fairly simple (User Data) 14 Available both for Metrics (IT, WLCG, etc.) and Logs (hw logs, OS logs, syslogs, app logs)

15 Unified Processing Flume Kafka sink Transport Kafka cluster (buffering) * Flume sinks Flume sinks Processing (e.g. Enrich FTS transfer metrics with WLCG topology from AGIS/GOCdb) User Jobs Flume sinks Proven useful many times 15

16 Data Processing Stream processing Data enrichment Join information from several sources (e.g. WLCG topology) Data aggregation Over time (e.g. summary statistics for a time bin) Over other dimensions (e.g. compute a cumulative metric for a set of machines hosting the same service) Data correlation Advanced Alarming: detect anomalies and failures correlating data from multiple sources (e.g. data centre topology-aware alarms) Batch processing Reprocessing, data compression, historical data, periodic reports 16

17 Unified Access Storage & Search Data Access Flume sinks HDFS User Views Flume sinks Elastic Search Flume sinks Influx DB CLI, API Reports Plots Scripts Default dashboards, and can be customized and extended fairly easily Multiple data access methods (dashboards, notebooks, CLI) Dashboards, reports and access via scripts to create new User Views or reports 17

18 Technologies and Usage Data Sources Storage and Search Data Access 18

19 MONIT Data Sources FTS Rucio XRootD Jobs User Data Lemon syslog app log collectd LogStash AMQ DB HTTP feed Logs Lemon metrics Metrics (http) MONIT Data is validated and normalized if necessary (e.g. standard names, date formats) Available both for Metrics (IT, WLCG, etc.) and Logs (hw logs, OS, syslogs, app logs) Adding User Data Sources is rather easy Data Sources Components vs Technology Purpose Access Metrics via Collectd Uses collectd plug-ins pre-installed for CERN host and services Log Sources via FLUME Move logs data via a Flume local agent or any HTTP client External Sources via HTTP Endpoint for data from external sources (few, well connected) Open to any source within CERN External Sources via AMQ AMQ Messaging end point Open to sources on WAN 19

20 MONIT Storage and Search MONIT Storage & Search Elastic Search Influx DB HDFS Searches Dashboards Reports CLI, API Default dashboards, and can be customized and extended fairly easily Provide data access methods (dashboards, notebooks, CLI) Dashboards, reports and access via scripts to create User Views or reports Mainstream and established technologies, do not need experts to develop them Multiple back ends suited for different use cases Storage and Search Technology Purpose Data type Retention Period ElasticSearch Short-term storage and index Raw data, metrics and logs 1 month, with current resources InfluxDB Time series storage Aggregated (1w/raw, 1M/10 m ) 5 years HDFS Long-term archive Raw data, metrics and logs forever 20

21 MONIT Data Access MONIT Plots Reports User Views CLI, API Scripts Data Access Technology Purpose Data Storage Kibana Full search/filter/discovery of data ElasticSearch Grafana Dashboards optimized for time series plots. Local alarms ElasticSearch, InfluxDB Zeppelin Notebooks for analysis, reports and plots. Native support for Spark HDFS API and CLIs Access from external applications, scripts, Python, etc. HDFS 21

22 Current Status and Progress 22

23 Infrastructure(s) - Current Numbers Designed, developed and deployed a new monitoring infrastructure capable of handling CERN IT and WLCG data > 150 VMs, 3.6 TB/day, 6 billion docs /day Maintenance legacy WLCG infrastructure and tools ~ 90 VMs Maintenance legacy ITMON infrastructure and tools (i.e. meter, timber) ~ 150 VMs 23

24 Infrastructure(s) - Operations Building and tuning the complete infrastructure Supporting existing services Depending on many external services ES, InfluxDB, HDFS Some also new and being set up Securing infrastructure Flume/Kafka/Spark/ES/HDFS Configuring infrastructure (Puppet 4) 24

25 New WLCG Monitoring 25

26 Current Data / WLCG WLCG Data Sources FTS XROOTD XROOTD ALICE DDM RUCIO DDM TRACES DDM ACCOUNTING PHEDEX ATLAS JM PANDA/PRODSYS CMS JM - HT CONDOR SAM3 ETF AGIS VOFEED REBUS GOCDB OIM Additional Data Sources ASAP ATLAS CRAB OPS CMS SPACEMON GLIDEINWMS LHCOPN BOINC - LHCATHOME PROTODUNE DAQ WMAGENTS WM ARCHIVE WLCG SPACE ACCOUNTING 26

27 Current Processing / WLCG Validation and Transformation Fields Verifications (e.g. check timestamp in milliseconds in all doc) Fields Extractions (e.g. extract FTS log link, transfer ID) Fields Computations (e.g. create unique document ID based on other fields Field Normalization: apply common names (e.g. dst_site, dst_country, lowercase, etc.) Enrichment Aggregation Specific Processing Topology Resolution Join raw data with AGIS, VOFeed, REBUS, etc. Binning over time Summary data (e.g. for a given interval) Specific Spark Jobs (e.g. efficiency = success vs. failures) DDM Site avail Job monitoring and accounting FTS, XRootD transfers, rates Sites Availability, profiles (prototype) 27

28 Current Views: MONIT Portal 28

29 WLCG Transfers 29

30 XRootD Transfers 9/25/

31 Transfers Overviews (T0, T1, T2) 9/25/

32 Site Availability Data 9/25/

33 Sites Availability Profiles 9/25/

34 Other Data in MONIT for WLCG Examples of other data sources LHCOPN network traffic (BOINC) WLCG Space Accounting OpenStack at CERN 34

35 LHCOPN 35

36 LHCOPN vs WLCG Transfers 9/25/

37 WLCG Space Accounting 9/25/

38 New IT DC Monitoring 38

39 New DC Monitoring using Collectd Lemon Agent is the last component in production from the old Lemon/DC Monitoring Moving to collectd collect system and service metrics optimized to handle thousands of metrics modular and portable with hundreds of plugins available easy to develop new plugins in Python/Java/C continuously improving and well documented 39

40 Data Sources FTS Rucio XRootD Jobs User Data Lemon syslog app log collectd LogStash Collect Data Source AMQ DB HTTP feed Logs Lemon metrics Metrics (http) Flume AMQ Flume DB Flume HTTP Flume Log GW Flume Lemon Metric GW Flume Metric GW Transport Flume Kafka sink Only component to add and use the full MONIT infrastructure All existing IT monitoring will be replaced (meter, notifications, dashbords) 40

41 Collectd Collectd Data Source Load Plugin Exec Plugin Lemon Plugin Sampling Write_HTTP Plugin HTTP Source Avro Sink Enrichment (e.g. host information, environment) Monitoring Infrastructure Validation Transformation Client on the host 41

42 Collectd Metrics and Plugins 9/25/

43 Replacement Strategy 1. Use an existing collectd plugin (recommended) Straightforward: main logic can be reused Many similarities at API level registermetric() => register_read() storesample() => dispatch() 2. Extend standard collectd plugin Requires development 3. Run lemon sensor using collectd wrapper 43

44 Alarms GNI (Snow/ ) Collectd Agents Local Alarms HTTP proxy Kafka Spark HTTP proxy Grafana InfluxDB Grafana Alarms = alarm Advanced Alarms 44

45 Reference and Contact Dashboards (CERN SSO login) monit.cern.ch Feedback/Requests (SNOW) cern.ch/monit-support Documentation cern.ch/monitdocs 45

46

47 Additional Info and Backup Slides 47

48 Notebooks with Zeppelin Extract Data from HDFS or ES 48

49 Manipulate the data and plot with common languages and tools Python Scala numpy 9/25/

50 Notebooks with Swan Extract Data from ES ROOT Python C++ CVMFS 9/25/

Monitoring system for geographically distributed datacenters based on Openstack. Gioacchino Vino

Monitoring system for geographically distributed datacenters based on Openstack. Gioacchino Vino Monitoring system for geographically distributed datacenters based on Openstack Gioacchino Vino Tutor: Dott. Domenico Elia Tutor: Dott. Giacinto Donvito Borsa di studio GARR Orio Carlini 2016-2017 INFN

More information

Kibana, Grafana and Zeppelin on Monitoring data

Kibana, Grafana and Zeppelin on Monitoring data Kibana, Grafana and Zeppelin on Monitoring data Internal group presentaion Ildar Nurgaliev OpenLab Summer student Presentation structure About IT-CM-MM Section and myself Visualisation with Kibana 4 and

More information

Analytics Platform for ATLAS Computing Services

Analytics Platform for ATLAS Computing Services Analytics Platform for ATLAS Computing Services Ilija Vukotic for the ATLAS collaboration ICHEP 2016, Chicago, USA Getting the most from distributed resources What we want To understand the system To understand

More information

Big Data Analytics Tools. Applied to ATLAS Event Data

Big Data Analytics Tools. Applied to ATLAS Event Data Big Data Analytics Tools Applied to ATLAS Event Data Ilija Vukotic University of Chicago CHEP 2016, San Francisco Idea Big Data technologies have proven to be very useful for storage, visualization and

More information

Monitoring WLCG with lambda-architecture: a new scalable data store and analytics platform for monitoring at petabyte scale.

Monitoring WLCG with lambda-architecture: a new scalable data store and analytics platform for monitoring at petabyte scale. Journal of Physics: Conference Series PAPER OPEN ACCESS Monitoring WLCG with lambda-architecture: a new scalable data store and analytics platform for monitoring at petabyte scale. To cite this article:

More information

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Cy Erbay Senior Director Striim Executive Summary Striim is Uniquely Qualified to Solve the Challenges of Real-Time

More information

Thales PunchPlatform Agenda

Thales PunchPlatform Agenda Thales PunchPlatform Agenda What It Does Building Blocks PunchPlatform team Deployment & Operations Typical Setups Customers and Use Cases RoadMap 1 What It Does Compose Arbitrary Industrial Data Processing

More information

The Art of Container Monitoring. Derek Chen

The Art of Container Monitoring. Derek Chen The Art of Container Monitoring Derek Chen 2016.9.22 About me DevOps Engineer at Trend Micro Agile transformation Micro service and cloud service Docker integration Monitoring system development Automate

More information

WLCG Transfers Dashboard: a Unified Monitoring Tool for Heterogeneous Data Transfers.

WLCG Transfers Dashboard: a Unified Monitoring Tool for Heterogeneous Data Transfers. WLCG Transfers Dashboard: a Unified Monitoring Tool for Heterogeneous Data Transfers. J Andreeva 1, A Beche 1, S Belov 2, I Kadochnikov 2, P Saiz 1 and D Tuckett 1 1 CERN (European Organization for Nuclear

More information

PNDA.io: when BGP meets Big-Data

PNDA.io: when BGP meets Big-Data PNDA.io: when BGP meets Big-Data Let s go back in time 26 th April 2017 The Internet is very much alive Millions of BGP events occurring every day 15 Routers Monitored 410 active peers (both IPv4 and IPv6)

More information

The ATLAS Software Installation System v2 Alessandro De Salvo Mayuko Kataoka, Arturo Sanchez Pineda,Yuri Smirnov CHEP 2015

The ATLAS Software Installation System v2 Alessandro De Salvo Mayuko Kataoka, Arturo Sanchez Pineda,Yuri Smirnov CHEP 2015 The ATLAS Software Installation System v2 Alessandro De Salvo Mayuko Kataoka, Arturo Sanchez Pineda,Yuri Smirnov CHEP 2015 Overview Architecture Performance LJSFi Overview LJSFi is an acronym of Light

More information

Monitoring of large-scale federated data storage: XRootD and beyond.

Monitoring of large-scale federated data storage: XRootD and beyond. Monitoring of large-scale federated data storage: XRootD and beyond. J Andreeva 1, A Beche 1, S Belov 2, D Diguez Arias 1, D Giordano 1, D Oleynik 2, A Petrosyan 2, P Saiz 1, M Tadel 3, D Tuckett 1 and

More information

Linux Clusters Institute: Monitoring. Zhongtao Zhang, System Administrator, Holland Computing Center, University of Nebraska-Lincoln

Linux Clusters Institute: Monitoring. Zhongtao Zhang, System Administrator, Holland Computing Center, University of Nebraska-Lincoln Linux Clusters Institute: Monitoring Zhongtao Zhang, System Administrator, Holland Computing Center, University of Nebraska-Lincoln Why monitor? 2 Service Level Agreement (SLA) Which services must be provided

More information

Fluentd + MongoDB + Spark = Awesome Sauce

Fluentd + MongoDB + Spark = Awesome Sauce Fluentd + MongoDB + Spark = Awesome Sauce Nishant Sahay, Sr. Architect, Wipro Limited Bhavani Ananth, Tech Manager, Wipro Limited Your company logo here Wipro Open Source Practice: Vision & Mission Vision

More information

Table 1 The Elastic Stack use cases Use case Industry or vertical market Operational log analytics: Gain real-time operational insight, reduce Mean Ti

Table 1 The Elastic Stack use cases Use case Industry or vertical market Operational log analytics: Gain real-time operational insight, reduce Mean Ti Solution Overview Cisco UCS Integrated Infrastructure for Big Data with the Elastic Stack Cisco and Elastic deliver a powerful, scalable, and programmable IT operations and security analytics platform

More information

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI) CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI) The Certificate in Software Development Life Cycle in BIGDATA, Business Intelligence and Tableau program

More information

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development:: Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized

More information

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Raanan Dagan and Rohit Pujari September 25, 2017 Washington, DC Forward-Looking Statements During the course of this presentation, we may

More information

ANSE: Advanced Network Services for [LHC] Experiments

ANSE: Advanced Network Services for [LHC] Experiments ANSE: Advanced Network Services for [LHC] Experiments Artur Barczyk California Institute of Technology Joint Techs 2013 Honolulu, January 16, 2013 Introduction ANSE is a project funded by NSF s CC-NIE

More information

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES 1 THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon Vincent.Garonne@cern.ch ph-adp-ddm-lab@cern.ch XLDB

More information

COSMOS. Controls Open Source MOnitoring System Project Status Report. Frank on behalf of the COSMOS core team BE-CO Technical Meeting:

COSMOS. Controls Open Source MOnitoring System Project Status Report. Frank on behalf of the COSMOS core team BE-CO Technical Meeting: COSMOS Controls Open Source MOnitoring System Project Status Report Frank Frank on behalf of the COSMOS core team BE-CO Technical Meeting: 09-11-2017 Laura Julien Luigi Felix Sergey Agenda ABACUS review

More information

High-Performance Event Processing Bridging the Gap between Low Latency and High Throughput Bernhard Seeger University of Marburg

High-Performance Event Processing Bridging the Gap between Low Latency and High Throughput Bernhard Seeger University of Marburg High-Performance Event Processing Bridging the Gap between Low Latency and High Throughput Bernhard Seeger University of Marburg common work with Nikolaus Glombiewski, Michael Körber, Marc Seidemann 1.

More information

Real-time monitoring Slurm jobs with InfluxDB September Carlos Fenoy García

Real-time monitoring Slurm jobs with InfluxDB September Carlos Fenoy García Real-time monitoring Slurm jobs with InfluxDB September 2016 Carlos Fenoy García Agenda Problem description Current Slurm profiling Our solution Conclusions Problem description Monitoring of jobs is becoming

More information

Using Prometheus with InfluxDB for metrics storage

Using Prometheus with InfluxDB for metrics storage Using Prometheus with InfluxDB for metrics storage Roman Vynar Senior Site Reliability Engineer, Quiq September 26, 2017 About Quiq Quiq is a messaging platform for customer service. https://goquiq.com

More information

Application monitoring with BELK. Nishant Sahay, Sr. Architect Bhavani Ananth, Architect

Application monitoring with BELK. Nishant Sahay, Sr. Architect Bhavani Ananth, Architect Application monitoring with BELK Nishant Sahay, Sr. Architect Bhavani Ananth, Architect Why logs Business PoV Input Data Analytics User Interactions /Behavior End user Experience/ Improvements 2017 Wipro

More information

WLCG Network Throughput WG

WLCG Network Throughput WG WLCG Network Throughput WG Shawn McKee, Marian Babik for the Working Group HEPiX Tsukuba 16-20 October 2017 Working Group WLCG Network Throughput WG formed in the fall of 2014 within the scope of WLCG

More information

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance

More information

Lenses 2.1 Enterprise Features PRODUCT DATA SHEET

Lenses 2.1 Enterprise Features PRODUCT DATA SHEET Lenses 2.1 Enterprise Features PRODUCT DATA SHEET 1 OVERVIEW DataOps is the art of progressing from data to value in seconds. For us, its all about making data operations as easy and fast as using the

More information

microsoft

microsoft 70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series

More information

@InfluxDB. David Norton 1 / 69

@InfluxDB. David Norton  1 / 69 @InfluxDB David Norton (@dgnorton) david@influxdb.com 1 / 69 Instrumenting a Data Center 2 / 69 3 / 69 4 / 69 The problem: Efficiently monitor hundreds or thousands of servers 5 / 69 The solution: Automate

More information

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez Scientific data processing at global scale The LHC Computing Grid Chengdu (China), July 5th 2011 Who I am 2 Computing science background Working in the field of computing for high-energy physics since

More information

Overview. About CERN 2 / 11

Overview. About CERN 2 / 11 Overview CERN wanted to upgrade the data monitoring system of one of its Large Hadron Collider experiments called ALICE (A La rge Ion Collider Experiment) to ensure the experiment s high efficiency. They

More information

Streamlining CASTOR to manage the LHC data torrent

Streamlining CASTOR to manage the LHC data torrent Streamlining CASTOR to manage the LHC data torrent G. Lo Presti, X. Espinal Curull, E. Cano, B. Fiorini, A. Ieri, S. Murray, S. Ponce and E. Sindrilaru CERN, 1211 Geneva 23, Switzerland E-mail: giuseppe.lopresti@cern.ch

More information

One Pool To Rule Them All The CMS HTCondor/glideinWMS Global Pool. D. Mason for CMS Software & Computing

One Pool To Rule Them All The CMS HTCondor/glideinWMS Global Pool. D. Mason for CMS Software & Computing One Pool To Rule Them All The CMS HTCondor/glideinWMS Global Pool D. Mason for CMS Software & Computing 1 Going to try to give you a picture of the CMS HTCondor/ glideinwms global pool What s the use case

More information

Evolution of Cloud Computing in ATLAS

Evolution of Cloud Computing in ATLAS The Evolution of Cloud Computing in ATLAS Ryan Taylor on behalf of the ATLAS collaboration 1 Outline Cloud Usage and IaaS Resource Management Software Services to facilitate cloud use Sim@P1 Performance

More information

FUJITSU Software ServerView Cloud Monitoring Manager V1.1. Release Notes

FUJITSU Software ServerView Cloud Monitoring Manager V1.1. Release Notes FUJITSU Software ServerView Cloud Monitoring Manager V1.1 Release Notes J2UL-2170-01ENZ0(00) July 2016 Contents Contents About this Manual... 4 1 What's New?...6 1.1 Performance Improvements... 6 1.2

More information

Big Data Tools as Applied to ATLAS Event Data

Big Data Tools as Applied to ATLAS Event Data Big Data Tools as Applied to ATLAS Event Data I Vukotic 1, R W Gardner and L A Bryant University of Chicago, 5620 S Ellis Ave. Chicago IL 60637, USA ivukotic@uchicago.edu ATL-SOFT-PROC-2017-001 03 January

More information

Elasticsearch & ATLAS Data Management. European Organization for Nuclear Research (CERN)

Elasticsearch & ATLAS Data Management. European Organization for Nuclear Research (CERN) Elasticsearch & ATAS Data Management European Organization for Nuclear Research (CERN) ralph.vigne@cern.ch mario.lassnig@cern.ch ATAS Analytics Platform proposed eb. 2015; work in progress; correlate data

More information

9.2(1)SU1 OL

9.2(1)SU1 OL Prerequisites, page 1 workflow, page 2 architecture considerations, page 2 Deployment model considerations, page 3 for Large PoD and Small PoD, page 4 for Micro Node, page 8 Prerequisites Before you plan

More information

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without

More information

Qualys Cloud Platform

Qualys Cloud Platform 18 QUALYS SECURITY CONFERENCE 2018 Qualys Cloud Platform Looking Under the Hood: What Makes Our Cloud Platform so Scalable and Powerful Dilip Bachwani Vice President, Engineering, Qualys, Inc. Cloud Platform

More information

RIPE NCC Routing Information Service (RIS)

RIPE NCC Routing Information Service (RIS) RIPE NCC Routing Information Service (RIS) Overview Colin Petrie 14/12/2016 RON++ What is RIS? What is RIS? Worldwide network of BGP collectors Deployed at Internet Exchange Points - Including at AMS-IX

More information

Oracle Enterprise Manager 12c IBM DB2 Database Plug-in

Oracle Enterprise Manager 12c IBM DB2 Database Plug-in Oracle Enterprise Manager 12c IBM DB2 Database Plug-in May 2015 Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and

More information

The InfluxDB-Grafana plugin for Fuel Documentation

The InfluxDB-Grafana plugin for Fuel Documentation The InfluxDB-Grafana plugin for Fuel Documentation Release 0.8.0 Mirantis Inc. December 14, 2015 Contents 1 User documentation 1 1.1 Overview................................................. 1 1.2 Release

More information

The Wuppertal Tier-2 Center and recent software developments on Job Monitoring for ATLAS

The Wuppertal Tier-2 Center and recent software developments on Job Monitoring for ATLAS The Wuppertal Tier-2 Center and recent software developments on Job Monitoring for ATLAS DESY Computing Seminar Frank Volkmer, M. Sc. Bergische Universität Wuppertal Introduction Hardware Pleiades Cluster

More information

Spatial Analytics Built for Big Data Platforms

Spatial Analytics Built for Big Data Platforms Spatial Analytics Built for Big Platforms Roberto Infante Software Development Manager, Spatial and Graph 1 Copyright 2011, Oracle and/or its affiliates. All rights Global Digital Growth The Internet of

More information

Evaluation of the TimescaleDB PostgreSQL Time Series extension ID: 17834

Evaluation of the TimescaleDB PostgreSQL Time Series extension ID: 17834 Preliminary draft 17:58 16 September 2018 16 September 2018 elenastefancova@gmail.com Evaluation of the TimescaleDB PostgreSQL Time Series extension ID: 17834 Elena Štefancová Supervisors: Ignacio Coterillo

More information

Data pipelines with PostgreSQL & Kafka

Data pipelines with PostgreSQL & Kafka Data pipelines with PostgreSQL & Kafka Oskari Saarenmaa PostgresConf US 2018 - Jersey City Agenda 1. Introduction 2. Data pipelines, old and new 3. Apache Kafka 4. Sample data pipeline with Kafka & PostgreSQL

More information

Resource Allocation Resource Usage Data Access Control. Network Intelligence, Guidance. Statistics, States, Objects and Events.

Resource Allocation Resource Usage Data Access Control. Network Intelligence, Guidance. Statistics, States, Objects and Events. Resource Allocation Resource Usage Data Access Control POLICY ENGINE Network Intelligence, Guidance APPLICATIONS & PaaS ANALYTICS Workflow SERVICE ORCHESTRATION AND CONTROL NETWORK Statistics, States,

More information

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423

More information

Using Prometheus Operator to monitor OpenStack

Using Prometheus Operator to monitor OpenStack Using Prometheus Operator to monitor OpenStack Monitoring at Scale Pradeep Kilambi & Franck Baudin / Anandeep Pannu Engineering Mgr NFV Senior Principal Product Manager 15 November 2018 What we will be

More information

Towards Monitoring-as-a-service for Scientific Computing Cloud applications using the ElasticSearch ecosystem

Towards Monitoring-as-a-service for Scientific Computing Cloud applications using the ElasticSearch ecosystem Journal of Physics: Conference Series PAPER OPEN ACCESS Towards Monitoring-as-a-service for Scientific Computing Cloud applications using the ElasticSearch ecosystem Recent citations - Andrei Talas et

More information

Exam Questions

Exam Questions Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) https://www.2passeasy.com/dumps/70-775/ NEW QUESTION 1 You are implementing a batch processing solution by using Azure

More information

Monitoring the ALICE Grid with MonALISA

Monitoring the ALICE Grid with MonALISA Monitoring the ALICE Grid with MonALISA 2008-08-20 Costin Grigoras ALICE Workshop @ Sibiu Monitoring the ALICE Grid with MonALISA MonALISA Framework library Data collection and storage in ALICE Visualization

More information

CloudExpo November 2017 Tomer Levi

CloudExpo November 2017 Tomer Levi CloudExpo November 2017 Tomer Levi About me Full Stack Engineer @ Intel s Advanced Analytics group. Artificial Intelligence unit at Intel. Responsible for (1) Radical improvement of critical processes

More information

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver Using the SDACK Architecture to Build a Big Data Product Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver Outline A Threat Analytic Big Data product The SDACK Architecture Akka Streams and data

More information

Hadoop. Introduction / Overview

Hadoop. Introduction / Overview Hadoop Introduction / Overview Preface We will use these PowerPoint slides to guide us through our topic. Expect 15 minute segments of lecture Expect 1-4 hour lab segments Expect minimal pretty pictures

More information

The Future of Real-Time in Spark

The Future of Real-Time in Spark The Future of Real-Time in Spark Reynold Xin @rxin Spark Summit, New York, Feb 18, 2016 Why Real-Time? Making decisions faster is valuable. Preventing credit card fraud Monitoring industrial machinery

More information

CERN: LSF and HTCondor Batch Services

CERN: LSF and HTCondor Batch Services Batch @ CERN: LSF and HTCondor Batch Services Iain Steers, Jérôme Belleman, Ulrich Schwickerath IT-PES-PS INFN Visit: Batch Batch @ CERN 2 Outline The Move Environment Grid Pilot Local Jobs Conclusion

More information

Cloud & container monitoring , Lars Michelsen Check_MK Conference #4

Cloud & container monitoring , Lars Michelsen Check_MK Conference #4 Cloud & container monitoring 04.05.2018, Lars Michelsen Some cloud definitions Applications Data Runtime Middleware O/S Virtualization Servers Storage Networking Software-as-a-Service (SaaS) Applications

More information

WOMBATOAM OPERATIONS & MAINTENANCE FOR ERLANG & ELIXIR SYSTEMS

WOMBATOAM OPERATIONS & MAINTENANCE FOR ERLANG & ELIXIR SYSTEMS version 3.0.0 3.0.0 IMPROVEMENTS OVER 2.0.0 MONITORING ++ Extensive dashboard redesign with a new, more intuitive user interface using GridStack. ++ Improved Mnesia netsplit service to detect and fix partitions

More information

WOMBATOAM OPERATIONS & MAINTENANCE FOR ERLANG & ELIXIR SYSTEMS

WOMBATOAM OPERATIONS & MAINTENANCE FOR ERLANG & ELIXIR SYSTEMS version 3.0.0 3.0.0 IMPROVEMENTS OVER 2.0.0 MONITORING ++ Extensive dashboard redesign with a new, more intuitive user interface using GridStack. ++ Improved Mnesia netsplit service to detect and fix partitions

More information

Deep Security Integration with Sumo Logic

Deep Security Integration with Sumo Logic A Trend Micro White Paper I May 2016 Install, Integrate and Analyze» This paper is aimed at information security and solution architects looking to integrate the Trend Micro Deep Security with Sumo Logic.

More information

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland The LCG 3D Project Maria Girone, CERN The rd Open Grid Forum - OGF 4th June 2008, Barcelona Outline Introduction The Distributed Database (3D) Project Streams Replication Technology and Performance Availability

More information

Big Data Hadoop Course Content

Big Data Hadoop Course Content Big Data Hadoop Course Content Topics covered in the training Introduction to Linux and Big Data Virtual Machine ( VM) Introduction/ Installation of VirtualBox and the Big Data VM Introduction to Linux

More information

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Hadoop 1.0 Architecture Introduction to Hadoop & Big Data Hadoop Evolution Hadoop Architecture Networking Concepts Use cases

More information

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo Microsoft Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo NEW QUESTION 1 HOTSPOT You install the Microsoft Hive ODBC Driver on a computer that runs Windows

More information

Lambda Architecture for Batch and Stream Processing. October 2018

Lambda Architecture for Batch and Stream Processing. October 2018 Lambda Architecture for Batch and Stream Processing October 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only.

More information

Network Automation using modern tech. Egor Krivosheev 2degrees

Network Automation using modern tech. Egor Krivosheev 2degrees Network Automation using modern tech Egor Krivosheev 2degrees Key parts of network automation today Streaming Telemetry APIs SNMP and screen scraping are still around NETCONF RFC6241 XML encoding Most

More information

Services. Service descriptions. Cisco HCS services

Services. Service descriptions. Cisco HCS services Service descriptions, page 1 Infrastructure Platform Automation Description, page 5 Infrastructure Manager Sync Introduction, page 5 Service descriptions After the installation of the Cisco HCM-F platform,

More information

Employing HPC DEEP-EST for HEP Data Analysis. Viktor Khristenko (CERN, DEEP-EST), Maria Girone (CERN)

Employing HPC DEEP-EST for HEP Data Analysis. Viktor Khristenko (CERN, DEEP-EST), Maria Girone (CERN) Employing HPC DEEP-EST for HEP Data Analysis Viktor Khristenko (CERN, DEEP-EST), Maria Girone (CERN) 1 Outline The DEEP-EST Project Goals and Motivation HEP Data Analysis on HPC with Apache Spark on HPC

More information

OPENSTACK AGILITY. RED HAT RELIABILITY.

OPENSTACK AGILITY. RED HAT RELIABILITY. OPENSTACK AGILITY. RED HAT RELIABILITY. Operational Management How is it really done? And what should OpenStack do about it? Anandeep Pannu Senior Principal Product Manager 7 November 2017 Ops Management

More information

Big Data Infrastructure at Spotify

Big Data Infrastructure at Spotify Big Data Infrastructure at Spotify Wouter de Bie Team Lead Data Infrastructure September 26, 2013 2 Who am I? According to ZDNet: "The work they have done to improve the Apache Hive data warehouse system

More information

Regain control thanks to Prometheus. Guillaume Lefevre, DevOps Engineer, OCTO Technology Etienne Coutaud, DevOps Engineer, OCTO Technology

Regain control thanks to Prometheus. Guillaume Lefevre, DevOps Engineer, OCTO Technology Etienne Coutaud, DevOps Engineer, OCTO Technology Regain control thanks to Prometheus Guillaume Lefevre, DevOps Engineer, OCTO Technology Etienne Coutaud, DevOps Engineer, OCTO Technology About us Guillaume Lefevre DevOps Engineer, OCTO Technology @guillaumelfv

More information

#MicroFocusCyberSummit

#MicroFocusCyberSummit #MicroFocusCyberSummit Data Simplicity: ArcSight Data Platform enhances enterprise data via the Common Event Format Peter Titov Micro Focus #MicroFocusCyberSummit Agenda Usage Ingestion Management Solutions

More information

Cloud du CCIN2P3 pour l' ATLAS VO

Cloud du CCIN2P3 pour l' ATLAS VO Centre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules Cloud du CCIN2P3 pour l' ATLAS VO Vamvakopoulos Emmanouil «Rencontres LCG France» IRFU Saclay 1 2 December 2014

More information

The TDAQ Analytics Dashboard: a real-time web application for the ATLAS TDAQ control infrastructure

The TDAQ Analytics Dashboard: a real-time web application for the ATLAS TDAQ control infrastructure The TDAQ Analytics Dashboard: a real-time web application for the ATLAS TDAQ control infrastructure Giovanna Lehmann Miotto, Luca Magnoni, John Erik Sloper European Laboratory for Particle Physics (CERN),

More information

Innovatus Technologies

Innovatus Technologies HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String

More information

Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Love to Scale

Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Love to Scale Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Love to Scale Luca Canali, CERN About Luca Data Engineer and team lead at CERN Hadoop and Spark service, database services 18+ years

More information

Search Engines and Time Series Databases

Search Engines and Time Series Databases Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Search Engines and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2017/18

More information

Open-Falcon A Distributed and High-Performance Monitoring System. Yao-Wei Ou & Lai Wei 2017/05/22

Open-Falcon A Distributed and High-Performance Monitoring System. Yao-Wei Ou & Lai Wei 2017/05/22 Open-Falcon A Distributed and High-Performance Monitoring System Yao-Wei Ou & Lai Wei 2017/05/22 Let us begin with a little story Grafana PR#3787 [feature] Add Open-Falcon datasource I'm sorry but we will

More information

Using DC/OS for Continuous Delivery

Using DC/OS for Continuous Delivery Using DC/OS for Continuous Delivery DevPulseCon 2017 Elizabeth K. Joseph, @pleia2 Mesosphere 1 Elizabeth K. Joseph, Developer Advocate, Mesosphere 15+ years working in open source communities 10+ years

More information

Monitor your infrastructure with the Elastic Beats. Monica Sarbu

Monitor your infrastructure with the Elastic Beats. Monica Sarbu Monitor your infrastructure with the Elastic Beats Monica Sarbu Monica Sarbu Team lead, Beats team Email: monica@elastic.co Twitter: 2 Monitor your servers Apache logs 3 Monitor your servers Apache logs

More information

Smarter Storage with Containerized Applications. Always Aligned with your Changing World

Smarter Storage with Containerized Applications. Always Aligned with your Changing World Smarter Storage with Containerized Applications Always Aligned with your Changing World Gabriel Lopez Solutions Architect, Zadara Storage Accumulated years of experience in advanced software design and

More information

Oracle Enterprise Manager 12c Sybase ASE Database Plug-in

Oracle Enterprise Manager 12c Sybase ASE Database Plug-in Oracle Enterprise Manager 12c Sybase ASE Database Plug-in May 2015 Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only,

More information

Data Architectures in Azure for Analytics & Big Data

Data Architectures in Azure for Analytics & Big Data Data Architectures in for Analytics & Big Data October 20, 2018 Melissa Coates Solution Architect, BlueGranite Microsoft Data Platform MVP Blog: www.sqlchick.com Twitter: @sqlchick Data Architecture A

More information

Monitoring and Analytics With HTCondor Data

Monitoring and Analytics With HTCondor Data Monitoring and Analytics With HTCondor Data William Strecker-Kellogg RACF/SDCC @ BNL 1 RHIC/ATLAS Computing Facility (SDCC) Who are we? See our last two site reports from the HEPiX conference for a good

More information

Introduction to SciTokens

Introduction to SciTokens Introduction to SciTokens Brian Bockelman, On Behalf of the SciTokens Team https://scitokens.org This material is based upon work supported by the National Science Foundation under Grant No. 1738962. Any

More information

Elasticsearch. Presented by: Steve Mayzak, Director of Systems Engineering Vince Marino, Account Exec

Elasticsearch. Presented by: Steve Mayzak, Director of Systems Engineering Vince Marino, Account Exec Elasticsearch Presented by: Steve Mayzak, Director of Systems Engineering Vince Marino, Account Exec What about Elasticsearch the Company?! Support 100s of Companies in Production environments Training

More information

Time Series Live 2017

Time Series Live 2017 1 Time Series Schemas @Percona Live 2017 Who Am I? Chris Larsen Maintainer and author for OpenTSDB since 2013 Software Engineer @ Yahoo Central Monitoring Team Who I m not: A marketer A sales person 2

More information

Consuming Model-Driven Telemetry

Consuming Model-Driven Telemetry Consuming Model-Driven Telemetry Cristina Precup & Stefan Braicu Software Systems Engineers Cisco Spark How Questions? Use Cisco Spark to communicate with the speaker after the session 1. Find this session

More information

R-GMA (Relational Grid Monitoring Architecture) for monitoring applications

R-GMA (Relational Grid Monitoring Architecture) for monitoring applications R-GMA (Relational Grid Monitoring Architecture) for monitoring applications www.eu-egee.org egee EGEE-II INFSO-RI-031688 Acknowledgements Slides are taken/derived from the GILDA team Steve Fisher (RAL,

More information

MSG: An Overview of a Messaging System for the Grid

MSG: An Overview of a Messaging System for the Grid MSG: An Overview of a Messaging System for the Grid Daniel Rodrigues Presentation Summary Current Issues Messaging System Testing Test Summary Throughput Message Lag Flow Control Next Steps Current Issues

More information

Copyright 2018, Oracle and/or its affiliates. All rights reserved.

Copyright 2018, Oracle and/or its affiliates. All rights reserved. Beyond SQL Tuning: Insider's Guide to Maximizing SQL Performance Monday, Oct 22 10:30 a.m. - 11:15 a.m. Marriott Marquis (Golden Gate Level) - Golden Gate A Ashish Agrawal Group Product Manager Oracle

More information

GoDocker. A batch scheduling system with Docker containers

GoDocker. A batch scheduling system with Docker containers GoDocker A batch scheduling system with Docker containers Web - http://www.genouest.org/godocker/ Code - https://bitbucket.org/osallou/go-docker Twitter - #godocker Olivier Sallou IRISA - 2016 CC-BY-SA

More information

Overview. SUSE OpenStack Cloud Monitoring

Overview. SUSE OpenStack Cloud Monitoring Overview SUSE OpenStack Cloud Monitoring Overview SUSE OpenStack Cloud Monitoring Publication Date: 08/04/2017 SUSE LLC 10 Canal Park Drive Suite 200 Cambridge MA 02141 USA https://www.suse.com/documentation

More information

ELFms industrialisation plans

ELFms industrialisation plans ELFms industrialisation plans CERN openlab workshop 13 June 2005 German Cancio CERN IT/FIO http://cern.ch/elfms ELFms industrialisation plans, 13/6/05 Outline Background What is ELFms Collaboration with

More information

exam. Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0

exam.   Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0 70-775.exam Number: 70-775 Passing Score: 800 Time Limit: 120 min File Version: 1.0 Microsoft 70-775 Perform Data Engineering on Microsoft Azure HDInsight Version 1.0 Exam A QUESTION 1 You use YARN to

More information

LOG AGGREGATION. To better manage your Red Hat footprint. Miguel Pérez Colino Strategic Design Team - ISBU

LOG AGGREGATION. To better manage your Red Hat footprint. Miguel Pérez Colino Strategic Design Team - ISBU LOG AGGREGATION To better manage your Red Hat footprint Miguel Pérez Colino Strategic Design Team - ISBU 2017-05-03 @mmmmmmpc Agenda Managing your Red Hat footprint with Log Aggregation The Situation The

More information

ATLAS Analysis Workshop Summary

ATLAS Analysis Workshop Summary ATLAS Analysis Workshop Summary Matthew Feickert 1 1 Southern Methodist University March 29th, 2016 Matthew Feickert (SMU) ATLAS Analysis Workshop Summary March 29th, 2016 1 Outline 1 ATLAS Analysis with

More information