Monitoring for IT Services and WLCG. Alberto AIMAR CERN-IT for the MONIT Team

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Monitoring for IT Services and WLCG. Alberto AIMAR CERN-IT for the MONIT Team"

Transcription

1

2 Monitoring for IT Services and WLCG Alberto AIMAR CERN-IT for the MONIT Team 2

3 Outline Scope and Mandate Architecture and Data Flow Technologies and Usage WLCG Monitoring IT DC and Services Monitoring 3

4 Scope and Mandate 4

5 Monitoring - Scope Data Centres Monitoring Monitoring of DCs at CERN and Wigner Hardware, operating system, and services Data Centres equipment, PDUs, temperature sensors, etc. Metrics and logs Experiment Dashboards WLCG Monitoring Sites availability, data transfers, job information, reports Used by WLCG, experiments, sites and users 3

6 Services Proposed Cover IT DC, Services and WLCG Monitor, collect, visualize, process, aggregate, alarm Metrics and Logs Infrastructure operations and scale Helping and supporting Interfacing new data sources Developing custom processing, aggregations, alarms Building dashboards and reports 6

7 Data Centres Monitoring (meter) 7

8 Experiment Dashboards Job monitoring, sites availability, data management and transfers Used by experiments operation teams, sites, users, WLCG 8

9 WLCG Monitoring - Mandate Regroup monitoring activities hosted by CERN IT Monitoring of Data Centres, WLCG and Experiment Dashboards ETF, HammerCloud testing frameworks Uniform with standard CERN IT practices Management of services, communication, tools Review existing monitoring usage and needs (IT, WLCG, etc.) Investigate, implement established open source technologies Reduce dependencies on in-house software and on few experts Continue support, while preparing the new services 9

10 Architecture and Data Flow 10

11 Infrastructure Monitoring Job Monitoring Data mgmt and transfers Data Centres Monitoring Previous Monitoring Data Sources Metrics Manager Lemon Agent XSLS ATLAS Rucio FTS Servers DPM Servers XROOTD Servers CRAB2 CRAB3 WM Agent Farmout Grid Control CMS Connect PANDA WMS ProdSys Nagios VOFeed OIM GOCDB REBUS Transport Flume AMQ Kafka Flume AMQ GLED HTTP Collector z SQL Collector MonaLISA Collector AMQ HTTP GET HTTP PUT Storage &Search HDFS ElasticSearch Oracle ElasticSearch HDFS Oracle ElasticSearch Oracle ElasticSearch Processing & Aggregation Spark Hadoop Jobs GNI Oracle PL/SQL ESPER Spark Oracle PL/SQL ES Queries ESPER Display Access Kibana Jupyter Zeppelin Dashboards (ED) Kibana Zeppelin Real Time (ED) Accounting (ED) API (ED) SSB (ED) SAM3 (ED) API (ED)

12 Unified Monitoring Data Sources Transport Storage &Search Metrics Manager Lemon Agent XSLS ATLAS Rucio FTS Servers DPM Servers XROOTD Servers CRAB2 CRAB3 WM Agent Farmout Grid Control Flume AMQ z Kafka Hadoop HDFS ElasticSearch InfluxDB Processing & Aggregation CMS Connect PANDA WMS ProdSys Nagios VOFeed OIM GOCDB Spark Hadoop Jobs GNI REBUS Data Access Kibana Grafana Jupyter (Swan) Zeppelin

13 Flume Kafka sink Flume sinks Data Sources FTS Rucio XRootD Jobs User Data Lemon syslog app log AMQ DB HTTP feed Logs Lemon metrics Flume AMQ Flume DB Flume HTTP Flume Log GW Flume Metric GW Unified Monitoring Architecture Transport Kafka cluster (buffering) * Processing Data enrichment Data aggregation Batch Processing User Jobs Storage & Search HDFS Elastic Search Others (influxdb) Data Access CLI, API User Views ~100 data producers 3.5 TB/day 75k docs/sec 3 days retention in Kafka 13 spark jobs 24/7 13

14 Data Sources FTS Rucio XRootD Jobs User Data Lemon syslog app log collectd LogStash Unified Data Sources AMQ DB HTTP feed Logs Lemon metrics Metrics (http) Flume AMQ Flume DB Flume HTTP Flume Log GW Flume Lemon Metric GW Flume Metric GW Transport Flume Kafka sink Data is all channeled via Flume via gateways Validated and normalized if necessary (e.g. standard names, date formats) Adding new Data Sources is documented and fairly simple (User Data) 14 Available both for Metrics (IT, WLCG, etc.) and Logs (hw logs, OS logs, syslogs, app logs)

15 Unified Processing Flume Kafka sink Transport Kafka cluster (buffering) * Flume sinks Flume sinks Processing (e.g. Enrich FTS transfer metrics with WLCG topology from AGIS/GOCdb) User Jobs Flume sinks Proven useful many times 15

16 Data Processing Stream processing Data enrichment Join information from several sources (e.g. WLCG topology) Data aggregation Over time (e.g. summary statistics for a time bin) Over other dimensions (e.g. compute a cumulative metric for a set of machines hosting the same service) Data correlation Advanced Alarming: detect anomalies and failures correlating data from multiple sources (e.g. data centre topology-aware alarms) Batch processing Reprocessing, data compression, historical data, periodic reports 16

17 Unified Access Storage & Search Data Access Flume sinks HDFS User Views Flume sinks Elastic Search Flume sinks Influx DB CLI, API Reports Plots Scripts Default dashboards, and can be customized and extended fairly easily Multiple data access methods (dashboards, notebooks, CLI) Dashboards, reports and access via scripts to create new User Views or reports 17

18 Technologies and Usage Data Sources Storage and Search Data Access 18

19 MONIT Data Sources FTS Rucio XRootD Jobs User Data Lemon syslog app log collectd LogStash AMQ DB HTTP feed Logs Lemon metrics Metrics (http) MONIT Data is validated and normalized if necessary (e.g. standard names, date formats) Available both for Metrics (IT, WLCG, etc.) and Logs (hw logs, OS, syslogs, app logs) Adding User Data Sources is rather easy Data Sources Components vs Technology Purpose Access Metrics via Collectd Uses collectd plug-ins pre-installed for CERN host and services Log Sources via FLUME Move logs data via a Flume local agent or any HTTP client External Sources via HTTP Endpoint for data from external sources (few, well connected) Open to any source within CERN External Sources via AMQ AMQ Messaging end point Open to sources on WAN 19

20 MONIT Storage and Search MONIT Storage & Search Elastic Search Influx DB HDFS Searches Dashboards Reports CLI, API Default dashboards, and can be customized and extended fairly easily Provide data access methods (dashboards, notebooks, CLI) Dashboards, reports and access via scripts to create User Views or reports Mainstream and established technologies, do not need experts to develop them Multiple back ends suited for different use cases Storage and Search Technology Purpose Data type Retention Period ElasticSearch Short-term storage and index Raw data, metrics and logs 1 month, with current resources InfluxDB Time series storage Aggregated (1w/raw, 1M/10 m ) 5 years HDFS Long-term archive Raw data, metrics and logs forever 20

21 MONIT Data Access MONIT Plots Reports User Views CLI, API Scripts Data Access Technology Purpose Data Storage Kibana Full search/filter/discovery of data ElasticSearch Grafana Dashboards optimized for time series plots. Local alarms ElasticSearch, InfluxDB Zeppelin Notebooks for analysis, reports and plots. Native support for Spark HDFS API and CLIs Access from external applications, scripts, Python, etc. HDFS 21

22 Current Status and Progress 22

23 Infrastructure(s) - Current Numbers Designed, developed and deployed a new monitoring infrastructure capable of handling CERN IT and WLCG data > 150 VMs, 3.6 TB/day, 6 billion docs /day Maintenance legacy WLCG infrastructure and tools ~ 90 VMs Maintenance legacy ITMON infrastructure and tools (i.e. meter, timber) ~ 150 VMs 23

24 Infrastructure(s) - Operations Building and tuning the complete infrastructure Supporting existing services Depending on many external services ES, InfluxDB, HDFS Some also new and being set up Securing infrastructure Flume/Kafka/Spark/ES/HDFS Configuring infrastructure (Puppet 4) 24

25 New WLCG Monitoring 25

26 Current Data / WLCG WLCG Data Sources FTS XROOTD XROOTD ALICE DDM RUCIO DDM TRACES DDM ACCOUNTING PHEDEX ATLAS JM PANDA/PRODSYS CMS JM - HT CONDOR SAM3 ETF AGIS VOFEED REBUS GOCDB OIM Additional Data Sources ASAP ATLAS CRAB OPS CMS SPACEMON GLIDEINWMS LHCOPN BOINC - LHCATHOME PROTODUNE DAQ WMAGENTS WM ARCHIVE WLCG SPACE ACCOUNTING 26

27 Current Processing / WLCG Validation and Transformation Fields Verifications (e.g. check timestamp in milliseconds in all doc) Fields Extractions (e.g. extract FTS log link, transfer ID) Fields Computations (e.g. create unique document ID based on other fields Field Normalization: apply common names (e.g. dst_site, dst_country, lowercase, etc.) Enrichment Aggregation Specific Processing Topology Resolution Join raw data with AGIS, VOFeed, REBUS, etc. Binning over time Summary data (e.g. for a given interval) Specific Spark Jobs (e.g. efficiency = success vs. failures) DDM Site avail Job monitoring and accounting FTS, XRootD transfers, rates Sites Availability, profiles (prototype) 27

28 Current Views: MONIT Portal 28

29 WLCG Transfers 29

30 XRootD Transfers 9/25/

31 Transfers Overviews (T0, T1, T2) 9/25/

32 Site Availability Data 9/25/

33 Sites Availability Profiles 9/25/

34 Other Data in MONIT for WLCG Examples of other data sources LHCOPN network traffic (BOINC) WLCG Space Accounting OpenStack at CERN 34

35 LHCOPN 35

36 LHCOPN vs WLCG Transfers 9/25/

37 WLCG Space Accounting 9/25/

38 New IT DC Monitoring 38

39 New DC Monitoring using Collectd Lemon Agent is the last component in production from the old Lemon/DC Monitoring Moving to collectd collect system and service metrics optimized to handle thousands of metrics modular and portable with hundreds of plugins available easy to develop new plugins in Python/Java/C continuously improving and well documented 39

40 Data Sources FTS Rucio XRootD Jobs User Data Lemon syslog app log collectd LogStash Collect Data Source AMQ DB HTTP feed Logs Lemon metrics Metrics (http) Flume AMQ Flume DB Flume HTTP Flume Log GW Flume Lemon Metric GW Flume Metric GW Transport Flume Kafka sink Only component to add and use the full MONIT infrastructure All existing IT monitoring will be replaced (meter, notifications, dashbords) 40

41 Collectd Collectd Data Source Load Plugin Exec Plugin Lemon Plugin Sampling Write_HTTP Plugin HTTP Source Avro Sink Enrichment (e.g. host information, environment) Monitoring Infrastructure Validation Transformation Client on the host 41

42 Collectd Metrics and Plugins 9/25/

43 Replacement Strategy 1. Use an existing collectd plugin (recommended) Straightforward: main logic can be reused Many similarities at API level registermetric() => register_read() storesample() => dispatch() 2. Extend standard collectd plugin Requires development 3. Run lemon sensor using collectd wrapper 43

44 Alarms GNI (Snow/ ) Collectd Agents Local Alarms HTTP proxy Kafka Spark HTTP proxy Grafana InfluxDB Grafana Alarms = alarm Advanced Alarms 44

45 Reference and Contact Dashboards (CERN SSO login) monit.cern.ch Feedback/Requests (SNOW) cern.ch/monit-support Documentation cern.ch/monitdocs 45

46

47 Additional Info and Backup Slides 47

48 Notebooks with Zeppelin Extract Data from HDFS or ES 48

49 Manipulate the data and plot with common languages and tools Python Scala numpy 9/25/

50 Notebooks with Swan Extract Data from ES ROOT Python C++ CVMFS 9/25/

Kibana, Grafana and Zeppelin on Monitoring data

Kibana, Grafana and Zeppelin on Monitoring data Kibana, Grafana and Zeppelin on Monitoring data Internal group presentaion Ildar Nurgaliev OpenLab Summer student Presentation structure About IT-CM-MM Section and myself Visualisation with Kibana 4 and

More information

Thales PunchPlatform Agenda

Thales PunchPlatform Agenda Thales PunchPlatform Agenda What It Does Building Blocks PunchPlatform team Deployment & Operations Typical Setups Customers and Use Cases RoadMap 1 What It Does Compose Arbitrary Industrial Data Processing

More information

Monitoring of large-scale federated data storage: XRootD and beyond.

Monitoring of large-scale federated data storage: XRootD and beyond. Monitoring of large-scale federated data storage: XRootD and beyond. J Andreeva 1, A Beche 1, S Belov 2, D Diguez Arias 1, D Giordano 1, D Oleynik 2, A Petrosyan 2, P Saiz 1, M Tadel 3, D Tuckett 1 and

More information

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Raanan Dagan and Rohit Pujari September 25, 2017 Washington, DC Forward-Looking Statements During the course of this presentation, we may

More information

Application monitoring with BELK. Nishant Sahay, Sr. Architect Bhavani Ananth, Architect

Application monitoring with BELK. Nishant Sahay, Sr. Architect Bhavani Ananth, Architect Application monitoring with BELK Nishant Sahay, Sr. Architect Bhavani Ananth, Architect Why logs Business PoV Input Data Analytics User Interactions /Behavior End user Experience/ Improvements 2017 Wipro

More information

Using Prometheus with InfluxDB for metrics storage

Using Prometheus with InfluxDB for metrics storage Using Prometheus with InfluxDB for metrics storage Roman Vynar Senior Site Reliability Engineer, Quiq September 26, 2017 About Quiq Quiq is a messaging platform for customer service. https://goquiq.com

More information

Elasticsearch & ATLAS Data Management. European Organization for Nuclear Research (CERN)

Elasticsearch & ATLAS Data Management. European Organization for Nuclear Research (CERN) Elasticsearch & ATAS Data Management European Organization for Nuclear Research (CERN) ralph.vigne@cern.ch mario.lassnig@cern.ch ATAS Analytics Platform proposed eb. 2015; work in progress; correlate data

More information

WLCG Network Throughput WG

WLCG Network Throughput WG WLCG Network Throughput WG Shawn McKee, Marian Babik for the Working Group HEPiX Tsukuba 16-20 October 2017 Working Group WLCG Network Throughput WG formed in the fall of 2014 within the scope of WLCG

More information

Streamlining CASTOR to manage the LHC data torrent

Streamlining CASTOR to manage the LHC data torrent Streamlining CASTOR to manage the LHC data torrent G. Lo Presti, X. Espinal Curull, E. Cano, B. Fiorini, A. Ieri, S. Murray, S. Ponce and E. Sindrilaru CERN, 1211 Geneva 23, Switzerland E-mail: giuseppe.lopresti@cern.ch

More information

Challenges of Monitoring Distributed Systems

Challenges of Monitoring Distributed Systems Challenges of Monitoring Distributed Systems May 2017 Nenad Bozic @NenadBozicNs nenad.bozic@smartcat.io SmartCat www.smartcat.io @SmartCat_io Agenda Monitoring 101 Metric data stream and tools Log data

More information

Big Data Tools as Applied to ATLAS Event Data

Big Data Tools as Applied to ATLAS Event Data Big Data Tools as Applied to ATLAS Event Data I Vukotic 1, R W Gardner and L A Bryant University of Chicago, 5620 S Ellis Ave. Chicago IL 60637, USA ivukotic@uchicago.edu ATL-SOFT-PROC-2017-001 03 January

More information

The Wuppertal Tier-2 Center and recent software developments on Job Monitoring for ATLAS

The Wuppertal Tier-2 Center and recent software developments on Job Monitoring for ATLAS The Wuppertal Tier-2 Center and recent software developments on Job Monitoring for ATLAS DESY Computing Seminar Frank Volkmer, M. Sc. Bergische Universität Wuppertal Introduction Hardware Pleiades Cluster

More information

Monitoring the ALICE Grid with MonALISA

Monitoring the ALICE Grid with MonALISA Monitoring the ALICE Grid with MonALISA 2008-08-20 Costin Grigoras ALICE Workshop @ Sibiu Monitoring the ALICE Grid with MonALISA MonALISA Framework library Data collection and storage in ALICE Visualization

More information

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423

More information

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver Using the SDACK Architecture to Build a Big Data Product Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver Outline A Threat Analytic Big Data product The SDACK Architecture Akka Streams and data

More information

The Echo Project An Update. Alastair Dewhurst, Alison Packer, George Vasilakakos, Bruno Canning, Ian Johnson, Tom Byrne

The Echo Project An Update. Alastair Dewhurst, Alison Packer, George Vasilakakos, Bruno Canning, Ian Johnson, Tom Byrne The Echo Project An Update Alastair Dewhurst, Alison Packer, George Vasilakakos, Bruno Canning, Ian Johnson, Tom Byrne What is Echo? Service S3 / Swift XrootD / GridFTP Ceph cluster with Erasure Coded

More information

GoDocker. A batch scheduling system with Docker containers

GoDocker. A batch scheduling system with Docker containers GoDocker A batch scheduling system with Docker containers Web - http://www.genouest.org/godocker/ Code - https://bitbucket.org/osallou/go-docker Twitter - #godocker Olivier Sallou IRISA - 2016 CC-BY-SA

More information

Introduction to SciTokens

Introduction to SciTokens Introduction to SciTokens Brian Bockelman, On Behalf of the SciTokens Team https://scitokens.org This material is based upon work supported by the National Science Foundation under Grant No. 1738962. Any

More information

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland The LCG 3D Project Maria Girone, CERN The rd Open Grid Forum - OGF 4th June 2008, Barcelona Outline Introduction The Distributed Database (3D) Project Streams Replication Technology and Performance Availability

More information

Resource Allocation Resource Usage Data Access Control. Network Intelligence, Guidance. Statistics, States, Objects and Events.

Resource Allocation Resource Usage Data Access Control. Network Intelligence, Guidance. Statistics, States, Objects and Events. Resource Allocation Resource Usage Data Access Control POLICY ENGINE Network Intelligence, Guidance APPLICATIONS & PaaS ANALYTICS Workflow SERVICE ORCHESTRATION AND CONTROL NETWORK Statistics, States,

More information

Singularity tests at CC-IN2P3 for Atlas

Singularity tests at CC-IN2P3 for Atlas Centre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules Singularity tests at CC-IN2P3 for Atlas Vamvakopoulos Emmanouil Journées LCG-France, 22-24 Novembre 2017, LPC

More information

MSG: An Overview of a Messaging System for the Grid

MSG: An Overview of a Messaging System for the Grid MSG: An Overview of a Messaging System for the Grid Daniel Rodrigues Presentation Summary Current Issues Messaging System Testing Test Summary Throughput Message Lag Flow Control Next Steps Current Issues

More information

ELFms industrialisation plans

ELFms industrialisation plans ELFms industrialisation plans CERN openlab workshop 13 June 2005 German Cancio CERN IT/FIO http://cern.ch/elfms ELFms industrialisation plans, 13/6/05 Outline Background What is ELFms Collaboration with

More information

Services. Service descriptions. Cisco HCS services

Services. Service descriptions. Cisco HCS services Service descriptions, page 1 Infrastructure Platform Automation Description, page 5 Infrastructure Manager Sync Introduction, page 5 Service descriptions After the installation of the Cisco HCM-F platform,

More information

LOG AGGREGATION. To better manage your Red Hat footprint. Miguel Pérez Colino Strategic Design Team - ISBU

LOG AGGREGATION. To better manage your Red Hat footprint. Miguel Pérez Colino Strategic Design Team - ISBU LOG AGGREGATION To better manage your Red Hat footprint Miguel Pérez Colino Strategic Design Team - ISBU 2017-05-03 @mmmmmmpc Agenda Managing your Red Hat footprint with Log Aggregation The Situation The

More information

Time Series Live 2017

Time Series Live 2017 1 Time Series Schemas @Percona Live 2017 Who Am I? Chris Larsen Maintainer and author for OpenTSDB since 2013 Software Engineer @ Yahoo Central Monitoring Team Who I m not: A marketer A sales person 2

More information

Data services for LHC computing

Data services for LHC computing Data services for LHC computing SLAC 1 Xavier Espinal on behalf of IT/ST DAQ to CC 8GB/s+4xReco Hot files Reliable Fast Processing DAQ Feedback loop WAN aware Tier-1/2 replica, multi-site High throughout

More information

Regain control thanks to Prometheus. Guillaume Lefevre, DevOps Engineer, OCTO Technology Etienne Coutaud, DevOps Engineer, OCTO Technology

Regain control thanks to Prometheus. Guillaume Lefevre, DevOps Engineer, OCTO Technology Etienne Coutaud, DevOps Engineer, OCTO Technology Regain control thanks to Prometheus Guillaume Lefevre, DevOps Engineer, OCTO Technology Etienne Coutaud, DevOps Engineer, OCTO Technology About us Guillaume Lefevre DevOps Engineer, OCTO Technology @guillaumelfv

More information

Storing metrics at scale with. Gnocchi. Julien Danjou OpenStack Day France 22 November 2016

Storing metrics at scale with. Gnocchi. Julien Danjou OpenStack Day France 22 November 2016 Storing metrics at scale with Gnocchi Julien Danjou OpenStack Day France 22 November 2016 Hello! I am Julien Danjou Principal Software Engineer at Red Hat You can find me at @juldanjou 1 What s the problem?

More information

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF Conference 2017 The Data Challenges of the LHC Reda Tafirout, TRIUMF Outline LHC Science goals, tools and data Worldwide LHC Computing Grid Collaboration & Scale Key challenges Networking ATLAS experiment

More information

WHITE PAPER. Reference Guide for Deploying and Configuring Apache Kafka

WHITE PAPER. Reference Guide for Deploying and Configuring Apache Kafka WHITE PAPER Reference Guide for Deploying and Configuring Apache Kafka Revised: 02/2015 Table of Content 1. Introduction 3 2. Apache Kafka Technology Overview 3 3. Common Use Cases for Kafka 4 4. Deploying

More information

The ATLAS Production System

The ATLAS Production System The ATLAS MC and Data Rodney Walker Ludwig Maximilians Universität Munich 2nd Feb, 2009 / DESY Computing Seminar Outline 1 Monte Carlo Production Data 2 3 MC Production Data MC Production Data Group and

More information

Network Traffic Visibility and Anomaly October 27th, 2016 Dan Ellis

Network Traffic Visibility and Anomaly October 27th, 2016 Dan Ellis Network Traffic Visibility and Anomaly Detection @Scale: October 27th, 2016 Dan Ellis Introduction Network traffic visibility? Introduction Network traffic visibility? What data is available on your network

More information

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch Nick Pentreath Nov / 14 / 16 Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About @MLnick Principal Engineer, IBM Apache Spark PMC Focused on machine learning

More information

HTCondor Week 2015: Implementing an HTCondor service at CERN

HTCondor Week 2015: Implementing an HTCondor service at CERN HTCondor Week 2015: Implementing an HTCondor service at CERN Iain Steers, Jérôme Belleman, Ulrich Schwickerath IT-PES-PS HTCondor Week 2015 HTCondor at CERN 2 Outline The Move Environment Grid Pilot Local

More information

SNAG: SDN-managed Network Architecture for GridFTP Transfers

SNAG: SDN-managed Network Architecture for GridFTP Transfers SNAG: SDN-managed Network Architecture for GridFTP Transfers Deepak Nadig Anantha, Zhe Zhang, Byrav Ramamurthy, Brian Bockelman, Garhan Attebury and David Swanson Dept. of Computer Science & Engineering,

More information

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science T. Maeno, K. De, A. Klimentov, P. Nilsson, D. Oleynik, S. Panitkin, A. Petrosyan, J. Schovancova, A. Vaniachine,

More information

Measuring HEC Performance For Fun and Profit

Measuring HEC Performance For Fun and Profit Measuring HEC Performance For Fun and Profit Itay Neeman Director, Engineering, Splunk Clif Gordon Principal Software Engineer, Splunk September 2017 Washington, DC Forward-Looking Statements During the

More information

Empfehlungen vom BigData Admin

Empfehlungen vom BigData Admin Empfehlungen vom BigData Admin an den Oracle DBA Florian Feicht, Alexander Hofstetter @FlorianFeicht @lxdba doag2017 Our company. Trivadis is a market leader in IT consulting, system integration, solution

More information

New data access with HTTP/WebDAV in the ATLAS experiment

New data access with HTTP/WebDAV in the ATLAS experiment New data access with HTTP/WebDAV in the ATLAS experiment Johannes Elmsheuser on behalf of the ATLAS collaboration Ludwig-Maximilians-Universität München 13 April 2015 21st International Conference on Computing

More information

Data Analyst Nanodegree Syllabus

Data Analyst Nanodegree Syllabus Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working

More information

End to End Analysis on System z IBM Transaction Analysis Workbench for z/os. James Martin IBM Tools Product SME August 10, 2015

End to End Analysis on System z IBM Transaction Analysis Workbench for z/os. James Martin IBM Tools Product SME August 10, 2015 End to End Analysis on System z IBM Transaction Analysis Workbench for z/os James Martin IBM Tools Product SME August 10, 2015 Please note IBM s statements regarding its plans, directions, and intent are

More information

Overview. SUSE OpenStack Cloud Monitoring

Overview. SUSE OpenStack Cloud Monitoring Overview SUSE OpenStack Cloud Monitoring Overview SUSE OpenStack Cloud Monitoring Publication Date: 08/04/2017 SUSE LLC 10 Canal Park Drive Suite 200 Cambridge MA 02141 USA https://www.suse.com/documentation

More information

Managing Oracle Exadata Database Machine with Oracle Enterprise Manager 11g

<Insert Picture Here> Managing Oracle Exadata Database Machine with Oracle Enterprise Manager 11g Managing Oracle Exadata Database Machine with Oracle Enterprise Manager 11g Exadata Overview Oracle Exadata Database Machine Extreme ROI Platform Fast Predictable Performance Monitor

More information

Oracle Big Data Discovery

Oracle Big Data Discovery Oracle Big Data Discovery Turning Data into Business Value Harald Erb Oracle Business Analytics & Big Data 1 Safe Harbor Statement The following is intended to outline our general product direction. It

More information

Talend Big Data Sandbox. Big Data Insights Cookbook

Talend Big Data Sandbox. Big Data Insights Cookbook Overview Pre-requisites Setup & Configuration Hadoop Distribution Download Demo (Scenario) Overview Pre-requisites Setup & Configuration Hadoop Distribution Demo (Scenario) About this cookbook What is

More information

AGIS: The ATLAS Grid Information System

AGIS: The ATLAS Grid Information System AGIS: The ATLAS Grid Information System Alexey Anisenkov 1, Sergey Belov 2, Alessandro Di Girolamo 3, Stavro Gayazov 1, Alexei Klimentov 4, Danila Oleynik 2, Alexander Senchenko 1 on behalf of the ATLAS

More information

OpenStack Magnum Pike and the CERN cloud. Spyros

OpenStack Magnum Pike and the CERN cloud. Spyros OpenStack Magnum Pike and the CERN cloud Spyros Trigazis @strigazi OpenStack Magnum OpenStack Magnum #openstack-containers Kubernetes, Docker Swarm, Apache Mesos, DC/OS (experimental) aas Deep integration

More information

The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.

The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017. Dublin Apache Kafka Meetup, 30 August 2017 The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Joseph @pleia2 * ASF projects 1 Elizabeth K. Joseph, Developer Advocate Developer Advocate

More information

Analyze Bug Statistics using Kibana Dashboard and Get Voice Alerts

Analyze Bug Statistics using Kibana Dashboard and Get Voice Alerts Analyze Bug Statistics using Kibana Dashboard and Get Voice Alerts Kibana Dashboard Elast Alert Sensiple Notification System Abstract This white paper describes how Kibana Dashboard can be used to analyze

More information

Oracle Enterprise Manager. 1 Introduction. System Monitoring Plug-in for Oracle Enterprise Manager Ops Center Guide 11g Release 1 (

Oracle Enterprise Manager. 1 Introduction. System Monitoring Plug-in for Oracle Enterprise Manager Ops Center Guide 11g Release 1 ( Oracle Enterprise Manager System Monitoring Plug-in for Oracle Enterprise Manager Ops Center Guide 11g Release 1 (11.1.3.0.0) E18950-03 November 2011 This document provides describes the System Monitoring

More information

Global Software Distribution with CernVM-FS

Global Software Distribution with CernVM-FS Global Software Distribution with CernVM-FS Jakob Blomer CERN 2016 CCL Workshop on Scalable Computing October 19th, 2016 jblomer@cern.ch CernVM-FS 1 / 15 The Anatomy of a Scientific Software Stack (In

More information

Mothra: A Large-Scale Data Processing Platform for Network Security Analysis

Mothra: A Large-Scale Data Processing Platform for Network Security Analysis Mothra: A Large-Scale Data Processing Platform for Network Security Analysis Tony Cebzanov Software Engineering Institute Carnegie Mellon University Pittsburgh, PA 15213 REV-03.18.2016.0 1 Agenda Introduction

More information

ALERT LOGIC LOG MANAGER & LOG REVIEW

ALERT LOGIC LOG MANAGER & LOG REVIEW SOLUTION OVERVIEW: ALERT LOGIC LOG MANAGER & LOG REVIEW CLOUD-POWERED LOG MANAGEMENT AS A SERVICE Simplify Security and Compliance Across All Your IT Assets. Log management is an essential infrastructure

More information

PSOACI Tetration Overview. Mike Herbert

PSOACI Tetration Overview. Mike Herbert Tetration Overview Mike Herbert Cisco Spark How Questions? Use Cisco Spark to communicate with the speaker after the session 1. Find this session in the Cisco Live Mobile App 2. Click Join the Discussion

More information

Data Transfers Between LHC Grid Sites Dorian Kcira

Data Transfers Between LHC Grid Sites Dorian Kcira Data Transfers Between LHC Grid Sites Dorian Kcira dkcira@caltech.edu Caltech High Energy Physics Group hep.caltech.edu/cms CERN Site: LHC and the Experiments Large Hadron Collider 27 km circumference

More information

A never-ending database migration

A never-ending database migration A never-ending database migration Charles Delort IT-DB November 20, 2017 Table of Contents Years ago, decisions were made A few years later PostgreSQL Foreign Data Wrappers First step of Migration Apiato

More information

Towards Network Awareness in LHC Computing

Towards Network Awareness in LHC Computing Towards Network Awareness in LHC Computing CMS ALICE CERN Atlas LHCb LHC Run1: Discovery of a New Boson LHC Run2: Beyond the Standard Model Gateway to a New Era Artur Barczyk / Caltech Internet2 Technology

More information

High Throughput WAN Data Transfer with Hadoop-based Storage

High Throughput WAN Data Transfer with Hadoop-based Storage High Throughput WAN Data Transfer with Hadoop-based Storage A Amin 2, B Bockelman 4, J Letts 1, T Levshina 3, T Martin 1, H Pi 1, I Sfiligoi 1, M Thomas 2, F Wuerthwein 1 1 University of California, San

More information

Database Developers Forum APEX

Database Developers Forum APEX Database Developers Forum APEX 20.05.2014 Antonio Romero Marin, Aurelien Fernandes, Jose Rolland Lopez De Coca, Nikolay Tsvetkov, Zereyakob Makonnen, Zory Zaharieva BE-CO Contents Introduction to the Controls

More information

Evolution of the Logging Service Hands-on Hadoop Proof of Concept for CALS-2.0

Evolution of the Logging Service Hands-on Hadoop Proof of Concept for CALS-2.0 Evolution of the Logging Service Hands-on Hadoop Proof of Concept for CALS-2.0 Chris Roderick Marcin Sobieszek Piotr Sowinski Nikolay Tsvetkov Jakub Wozniak Courtesy IT-DB Agenda Intro to CALS System Hadoop

More information

Monitor your containers with the Elastic Stack. Monica Sarbu

Monitor your containers with the Elastic Stack. Monica Sarbu Monitor your containers with the Elastic Stack Monica Sarbu Monica Sarbu Team lead, Beats team monica@elastic.co 3 Monitor your containers with the Elastic Stack Elastic Stack 5 Beats are lightweight shippers

More information

ARC integration for CMS

ARC integration for CMS ARC integration for CMS ARC integration for CMS Erik Edelmann 2, Laurence Field 3, Jaime Frey 4, Michael Grønager 2, Kalle Happonen 1, Daniel Johansson 2, Josva Kleist 2, Jukka Klem 1, Jesper Koivumäki

More information

TAKE CONTROL OF LOGS WITH ELASTICSEARCH

TAKE CONTROL OF LOGS WITH ELASTICSEARCH TAKE CONTROL OF LOGS WITH ELASTICSEARCH AGENDA Benefits of Collec;ng Log Data Why Use Elas;csearch (and the Elas;c Stack) Using the Elas;c Stack to Collect Logs Learning about your System Why Collect Log

More information

European Grid Infrastructure

European Grid Infrastructure EGI-InSPIRE European Grid Infrastructure A pan-european Research Infrastructure supporting the digital European Research Area Michel Drescher Technical Manager, EGI.eu Michel.Drescher@egi.eu TPDL 2013

More information

DC/OS Metrics. (formerly known as Project Ambrose) Application and Resource Metrics in DC/OS Enterprise. Nick Parker at..

DC/OS Metrics. (formerly known as Project Ambrose) Application and Resource Metrics in DC/OS Enterprise. Nick Parker at.. DC/OS Metrics (formerly known as Project Ambrose) Application and Resource Metrics in DC/OS Enterprise Nick Parker at.. 1 Introduction Nick Parker DC/OS Slack: chat.dcos.io DC/OS Mailing List: users@dcos.io

More information

Virtualization of the ATLAS Tier-2/3 environment on the HPC cluster NEMO

Virtualization of the ATLAS Tier-2/3 environment on the HPC cluster NEMO Virtualization of the ATLAS Tier-2/3 environment on the HPC cluster NEMO Ulrike Schnoor (CERN) Anton Gamel, Felix Bührer, Benjamin Rottler, Markus Schumacher (University of Freiburg) February 02, 2018

More information

Operating the Distributed NDGF Tier-1

Operating the Distributed NDGF Tier-1 Operating the Distributed NDGF Tier-1 Michael Grønager Technical Coordinator, NDGF International Symposium on Grid Computing 08 Taipei, April 10th 2008 Talk Outline What is NDGF? Why a distributed Tier-1?

More information

The CORAL Project. Dirk Düllmann for the CORAL team Open Grid Forum, Database Workshop Barcelona, 4 June 2008

The CORAL Project. Dirk Düllmann for the CORAL team Open Grid Forum, Database Workshop Barcelona, 4 June 2008 The CORAL Project Dirk Düllmann for the CORAL team Open Grid Forum, Database Workshop Barcelona, 4 June 2008 Outline CORAL - a foundation for Physics Database Applications in the LHC Computing Grid (LCG)

More information

70-414: Implementing an Advanced Server Infrastructure Course 01 - Creating the Virtualization Infrastructure

70-414: Implementing an Advanced Server Infrastructure Course 01 - Creating the Virtualization Infrastructure 70-414: Implementing an Advanced Server Infrastructure Course 01 - Creating the Virtualization Infrastructure Slide 1 Creating the Virtualization Infrastructure Slide 2 Introducing Microsoft System Center

More information

Datasheet FUJITSU Software ServerView Cloud Monitoring Manager V1.1

Datasheet FUJITSU Software ServerView Cloud Monitoring Manager V1.1 Datasheet FUJITSU Software ServerView Cloud Monitoring Manager V1.1 Datasheet FUJITSU Software ServerView Cloud Monitoring Manager V1.1 A Monitoring Cloud Service for Enterprise OpenStack Systems Cloud

More information

EMC Hybrid Cloud. Umair Riaz - vspecialist

EMC Hybrid Cloud. Umair Riaz - vspecialist EMC Hybrid Cloud Umair Riaz - vspecialist 1 The Business Drivers RESPOND FASTER TO DRIVE NEW REVENUE INCREASE AGILITY REFOCUS RESOURCES TOWARD BUSINESS VALUE INCREASE VISIBILITY & CONTROL 2 CLOUD TRANSFORMS

More information

Real-time Streaming Applications on AWS Patterns and Use Cases

Real-time Streaming Applications on AWS Patterns and Use Cases Real-time Streaming Applications on AWS Patterns and Use Cases Paul Armstrong - Solutions Architect (AWS) Tom Seddon - Data Engineering Tech Lead (Deliveroo) 28 th June 2017 2016, Amazon Web Services,

More information

Introducing Oracle Machine Learning

Introducing Oracle Machine Learning Introducing Oracle Machine Learning A Collaborative Zeppelin notebook for Oracle s machine learning capabilities Charlie Berger Marcos Arancibia Mark Hornick Advanced Analytics and Machine Learning Copyright

More information

and the GridKa mass storage system Jos van Wezel / GridKa

and the GridKa mass storage system Jos van Wezel / GridKa and the GridKa mass storage system / GridKa [Tape TSM] staging server 2 Introduction Grid storage and storage middleware dcache h and TSS TSS internals Conclusion and further work 3 FZK/GridKa The GridKa

More information

13th International Workshop on Advanced Computing and Analysis Techniques in Physics Research ACAT 2010 Jaipur, India February

13th International Workshop on Advanced Computing and Analysis Techniques in Physics Research ACAT 2010 Jaipur, India February LHC Cloud Computing with CernVM Ben Segal 1 CERN 1211 Geneva 23, Switzerland E mail: b.segal@cern.ch Predrag Buncic CERN E mail: predrag.buncic@cern.ch 13th International Workshop on Advanced Computing

More information

IN-MEMORY DATA FABRIC: Real-Time Streaming

IN-MEMORY DATA FABRIC: Real-Time Streaming WHITE PAPER IN-MEMORY DATA FABRIC: Real-Time Streaming COPYRIGHT AND TRADEMARK INFORMATION 2014 GridGain Systems. All rights reserved. This document is provided as is. Information and views expressed in

More information

Self-driving Datacenter: Analytics

Self-driving Datacenter: Analytics Self-driving Datacenter: Analytics George Boulescu Consulting Systems Engineer 19/10/2016 Alvin Toffler is a former associate editor of Fortune magazine, known for his works discussing the digital revolution,

More information

Is NiFi compatible with Cloudera, Map R, Hortonworks, EMR, and vanilla distributions?

Is NiFi compatible with Cloudera, Map R, Hortonworks, EMR, and vanilla distributions? Kylo FAQ General What is Kylo? Capturing and processing big data isn't easy. That's why Apache products such as Spark, Kafka, Hadoop, and NiFi that scale, process, and manage immense data volumes are so

More information

USM Anywhere AlienApps Guide

USM Anywhere AlienApps Guide USM Anywhere AlienApps Guide Updated April 23, 2018 Copyright 2018 AlienVault. All rights reserved. AlienVault, AlienApp, AlienApps, AlienVault OSSIM, Open Threat Exchange, OTX, Unified Security Management,

More information

Eduardo

Eduardo Eduardo Silva @edsiper eduardo@treasure-data.com About Me Eduardo Silva Github & Twitter Personal Blog @edsiper http://edsiper.linuxchile.cl Treasure Data Open Source Engineer Fluentd / Fluent Bit http://github.com/fluent

More information

Developing Microsoft Azure Solutions (70-532) Syllabus

Developing Microsoft Azure Solutions (70-532) Syllabus Developing Microsoft Azure Solutions (70-532) Syllabus Cloud Computing Introduction What is Cloud Computing Cloud Characteristics Cloud Computing Service Models Deployment Models in Cloud Computing Advantages

More information

MBS Microsoft Oracle Plug-In 6.82 User Guide

MBS Microsoft Oracle Plug-In 6.82 User Guide MBS Microsoft Oracle Plug-In 6.82 User Guide 10 Oracle Plug-In This version of the Oracle Plug-In supports Windows Agents. It is an add-on that allows you to perform database backups on Oracle databases.

More information

MONITORING SERVERLESS ARCHITECTURES

MONITORING SERVERLESS ARCHITECTURES MONITORING SERVERLESS ARCHITECTURES CAN YOU HELP WITH SOME PRODUCTION PROBLEMS? Your Manager (CC) Rachel Gardner Rafal Gancarz Lead Consultant @ OpenCredo WHAT IS SERVERLESS? (CC) theaucitron Cloud-native

More information

The Evolution of Big Data Platforms and Data Science

The Evolution of Big Data Platforms and Data Science IBM Analytics The Evolution of Big Data Platforms and Data Science ECC Conference 2016 Brandon MacKenzie June 13, 2016 2016 IBM Corporation Hello, I m Brandon MacKenzie. I work at IBM. Data Science - Offering

More information

Hadoop JMX Monitoring and Alerting

Hadoop JMX Monitoring and Alerting Hadoop JMX Monitoring and Alerting Introduction High-Level Monitoring/Alert Flow Metrics Collector Agent Metrics Storage NameNode Metrics DataNode Metrics HBase Master Metrics RegionServer Metrics Data

More information

Search and Time Series Databases

Search and Time Series Databases Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Search and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria

More information

Availability for the modern datacentre Veeam Availability Suite v9.5

Availability for the modern datacentre Veeam Availability Suite v9.5 Availability for the modern datacentre Veeam Availability Suite v9.5 Jan van Leuken System Engineer Benelux, Veeam Software jan.vanleuken@veeam.com +31 (0)615 83 50 64 Robin van der Steenhoven Territory

More information

Europeana Core Service Platform

Europeana Core Service Platform Europeana Core Service Platform DELIVERABLE D7.1: Strategic Development Plan, Architectural Planning Revision Final Date of submission 30 October 2015 Author(s) Marcin Werla, PSNC Pavel Kats, Europeana

More information

Managing, Monitoring, and Reporting Functions

Managing, Monitoring, and Reporting Functions This chapter discusses various types of managing, monitoring, and reporting functions that can be used with Unified CVP. It covers the following areas: Unified CVP Operations Console Server Management,

More information

Data Storage Infrastructure at Facebook

Data Storage Infrastructure at Facebook Data Storage Infrastructure at Facebook Spring 2018 Cleveland State University CIS 601 Presentation Yi Dong Instructor: Dr. Chung Outline Strategy of data storage, processing, and log collection Data flow

More information

Administering Microsoft SQL Server 2012 Databases

Administering Microsoft SQL Server 2012 Databases Course 10775A: Administering Microsoft SQL Server 2012 Databases Course Details Course Outline Module 1: Introduction to SQL Server 2012 and its Toolset This module introduces the entire SQL Server platform

More information

Data Analyst Nanodegree Syllabus

Data Analyst Nanodegree Syllabus Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working

More information

A Tutorial on Apache Spark

A Tutorial on Apache Spark A Tutorial on Apache Spark A Practical Perspective By Harold Mitchell The Goal Learning Outcomes The Goal Learning Outcomes NOTE: The setup, installation, and examples assume Windows user Learn the following:

More information

ntopng A Web-based Network Traffic Monitoring Application

ntopng A Web-based Network Traffic Monitoring Application ntopng A Web-based Network Traffic Monitoring Application New York City, NY June 14th, 2017 Simone Mainardi linkedin.com/in/simonemainardi Agenda About ntop Network traffic monitoring

More information

Evolution of the Data Center

Evolution of the Data Center Cisco on Cisco Evolution of the Data Center Global Cloud Strategy & Tetration John Manville, SVP, Cisco IT Jon Woolwine, Distinguished Engineer, Cisco IT Benny Van de Voorde, Principal Engineer, Cisco

More information

Automation of Rolling Upgrade for Hadoop Cluster without Data Loss and Job Failures. Hiroshi Yamaguchi & Hiroyuki Adachi

Automation of Rolling Upgrade for Hadoop Cluster without Data Loss and Job Failures. Hiroshi Yamaguchi & Hiroyuki Adachi Automation of Rolling Upgrade for Hadoop Cluster without Data Loss and Job Failures Hiroshi Yamaguchi & Hiroyuki Adachi About Us 2 Hiroshi Yamaguchi Hiroyuki Adachi Hadoop DevOps Engineer Hadoop Engineer

More information

Volunteer Computing at CERN

Volunteer Computing at CERN Volunteer Computing at CERN BOINC workshop Sep 2014, Budapest Tomi Asp & Pete Jones, on behalf the LHC@Home team Agenda Overview Status of the LHC@Home projects Additional BOINC projects Service consolidation

More information

Talend Big Data Sandbox. Big Data Insights Cookbook

Talend Big Data Sandbox. Big Data Insights Cookbook Overview Pre-requisites Setup & Configuration Hadoop Distribution Download Demo (Scenario) Overview Pre-requisites Setup & Configuration Hadoop Distribution Demo (Scenario) About this cookbook What is

More information

WHITEPAPER. MemSQL Enterprise Feature List

WHITEPAPER. MemSQL Enterprise Feature List WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure

More information