Database monitoring and service validation. Dirk Duellmann CERN IT/PSS and 3D

Similar documents
LHCb Distributed Conditions Database

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

The CORAL Project. Dirk Düllmann for the CORAL team Open Grid Forum, Database Workshop Barcelona, 4 June 2008

Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW

LCG Conditions Database Project

Access to ATLAS Geometry and Conditions Databases

Oracle Database 11g: Real Application Testing & Manageability Overview

Experience of Data Grid simulation packages using.

Copyright 2018, Oracle and/or its affiliates. All rights reserved.

Microsoft SQL Server Fix Pack 15. Reference IBM

Workload Management. Stefano Lacaprara. CMS Physics Week, FNAL, 12/16 April Department of Physics INFN and University of Padova

Ideas for evolution of replication CERN

Oracle Enterprise Manager. 1 Before You Install. System Monitoring Plug-in for Oracle Unified Directory User's Guide Release 1.0

GoldenGate Zbigniew Baranowski

Real-World Performance Training Core Database Performance

Online data storage service strategy for the CERN computer Centre G. Cancio, D. Duellmann, M. Lamanna, A. Pace CERN, Geneva, Switzerland

EMC Unisphere for VMAX Database Storage Analyzer

Jyotheswar Kuricheti

Optimizing SQL Transactions

System Administration of PTC Windchill 11.0

Exadata Database Machine Administration Workshop NEW

Evolution of Database Replication Technologies for WLCG

Oracle Streams. Michal Hanzeli DBI013

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics

Oracle Database 12c Performance Management and Tuning

Enterprise Manager: Scalable Oracle Management

SQL Gone Wild: Taming Bad SQL the Easy Way (or the Hard Way) Sergey Koltakov Product Manager, Database Manageability

Oracle DBA workshop I

Evolution of Database Replication Technologies for WLCG

LCG data management at IN2P3 CC FTS SRM dcache HPSS

Spanish Tier-2. Francisco Matorras (IFCA) Nicanor Colino (CIEMAT) F. Matorras N.Colino, Spain CMS T2,.6 March 2008"

DR 3.3 Planning Guide

Monitoring Agent for Unix OS Version Reference IBM

The LHC Computing Grid

EMC Unisphere for VMAX Database Storage Analyzer

<Insert Picture Here> MySQL Cluster What are we working on

Exadata Database Machine Administration Workshop

Investigation on Oracle GoldenGate Veridata for Data Consistency in WLCG Distributed Database Environment

Event cataloguing and other database applications in ATLAS

Oracle Enterprise Manager 12c IBM DB2 Database Plug-in

ATLAS Oracle database applications and plans for use of the Oracle 11g enhancements

Oracle WebLogic Server 12c: Administration I

COOL performance optimization using Oracle hints

Performance and Scalability with Griddable.io

MySQL Enterprise Monitor Manual

PoS(EGICF12-EMITC2)106

ORACLE RAC DBA COURSE CONTENT

Computing. DOE Program Review SLAC. Rainer Bartoldus. Breakout Session 3 June BaBar Deputy Computing Coordinator

Oracle Database 12c Rel. 2 Cluster Health Advisor - How it Works & How to Use it

Monitoring Agent for SAP Applications Fix pack 11. Reference IBM

Managing Oracle9iAS Forms Services Using Oracle Enterprise Manager. An Oracle White Paper April 2002

MySQL Enterprise Monitor Manual

sflow Elisa Jasinska

Oracle WebCenter Portal Performance Tuning

Tablespace Usage By Schema In Oracle 11g Rac

Managing Oracle Real Application Clusters. An Oracle White Paper January 2002

Monitoring Agent for Microsoft Hyper-V Server Fix Pack 12. Reference IBM

Fit for Purpose Platform Positioning and Performance Architecture

EMC Unisphere for VMAX Database Storage Analyzer

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

<Insert Picture Here> Looking at Performance - What s new in MySQL Workbench 6.2

Oracle 11g: RAC and Grid Infrastructure Administration Accelerated Release 2

Oracle 10g and IPv6 IPv6 Summit 11 December 2003

RAC parameter configuration overview

Carlos Fernando Gamboa, BNL Andrew Wong, TRIUMF. WLCG Collaboration Workshop, CERN Geneva, April 2008.

CERN network overview. Ryszard Jurga

Oracle Database 10g: New Features for Administrators Release 2

BAC Guidelines. Dr. Martin Gajdzik Team Lead BTO Operations & BSM Solution Architect HPSW Professional Services

WLCG Network Throughput WG

Challenges of the LHC Computing Grid by the CMS experiment

Oracle EXAM - 1Z Oracle Exadata Database Machine Administration, Software Release 11.x Exam. Buy Full Product

White Paper. Altiris Design and Scalability Guide 10/7/2004

Status of KISTI Tier2 Center for ALICE

Oracle APEX 18.1 New Features

Real-time Monitoring, Inventory and Change Tracking for. Track. Report. RESOLVE!

IBM SmartCloud Application Performance Management UI Version User's Guide IBM SC

Oracle 11g: RAC and Grid Infrastructure Administration Accelerated Release 2

B. Pack -domain=c:\oracle\user_projects\domains\mydomain.jar -template=c:\oracle\userj:emplates\mydomain -template_name=nmy WebLogic Domain"

Oracle Database 10G. Lindsey M. Pickle, Jr. Senior Solution Specialist Database Technologies Oracle Corporation

Contact Center Assurance Dashboards

Oracle Database 12c: Performance Management and Tuning

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

MONitoring Agents using a Large Integrated Services Architecture. Iosif Legrand California Institute of Technology

WebLogic Server- Tips & Tricks for Troubleshooting Performance Issues. By: Abhay Kumar AST Corporation

Oracle Streams. An Oracle White Paper October 2002

<Insert Picture Here> Managing Oracle Exadata Database Machine with Oracle Enterprise Manager 11g

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

The Wuppertal Tier-2 Center and recent software developments on Job Monitoring for ATLAS

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

End-to-end Management with Grid Control. John Abrahams Technology Sales Consultant Oracle Nederland B.V.

Toad for Oracle Suite 2017 Functional Matrix

RDMS CMS Computing Activities before the LHC start

Monitoring & Tuning Azure SQL Database

Oracle Database 12c R2: Administration Workshop Ed 3 NEW

The ATLAS EventIndex: Full chain deployment and first operation

VMware vrealize Operations for Horizon Administration. Modified on 3 JUL 2018 VMware vrealize Operations for Horizon 6.4

A short introduction to the Worldwide LHC Computing Grid. Maarten Litmaath (CERN)

Optimizing GIS Services: Scalability & Performance. Firdaus Asri

Consolidating Enterprise Performance Analytics

How IBM Can Identify z/os Networking Issues without tracing

Transcription:

Database monitoring and service validation Dirk Duellmann CERN IT/PSS and 3D http://lcg3d.cern.ch

LCG Database Deployment Plan After October 05 workshop a database deployment plan has been presented to LCG GDB and MB http://agenda.cern.ch/fullagenda.php?ida=a057112 Two production phases March - Oct 06 : partial production service Production service (parallel to existing testbed) H/W requirements defined by experiments/projects Based on Oracle 10gR2 Subset of LCG tier 1 sites: ASCC, CERN, BNL, CNAF, GridKA, IN2P3, RAL Oct 06- onwards : full production service Adjusted h/w requirements (defined at summer 06 workshop) Other tier 1 sites joined in: PIC, NIKHEF, NDG, TRIUMF LCG 3D Status Dirk Duellmann 2

LCG 3D Service Architecture Oracle Streams http cache (SQUID) Cross DB copy & MySQL/SQLight Files T0 - autonomous reliable service O S M T1- db back bone - all data replicated - reliable service S O Online DB -autonomous reliable service O F O S T2 - local db cache -subset data -only local service M S R/O Access at Tier 1/2 (at least initially) LCG 3D Status Dirk Duellmann 3

Two Main Technologies - Databases & FroNTier Need to provide availability information and diagnostics to different audiences Database Administrators at Tier 0 and Tier 1 Experiment responsibles Grid Deployment team Granularity required is very different Approach proposed: Start from detailed (DBA level) monitoring at Tier sites Extract / aggregate higher level information for experiment and grid dashboards Add site test jobs (eg COOL service test) as experiment production is starting LCG 3D Status Dirk Duellmann 4

T0 and T1 Database Monitoring Web based collector and user interface available as part of Oracle s/w Oracle Enterprise Manager (aka Oracle Grid Control) A central Oracle Enterprise Manager repository at CERN has been setup to collect the status and detailed diagnostics of all 3D production clusters Some sites will in parallel have the information integrated into their site local OEM setups Allows to drill down to individual queries and users To be used by site DBAs All Tier 1 sites are requested to join now RAL is driving the test and documented procedure LCG 3D Status Dirk Duellmann 5

DB Resource Usage Lemon Integration & Weekly Reports For CERN Tier 0 we supply also LEMON probes Aggregate DB resources usage by experiment application To be used by experiment responsible Integrating now with new LEMON service concept Color coded availability and service capacity display Preparing weekly resource reports to experiments and grid deployment Amount of CPU & I/O used per DB application Pointing out problematic DB sessions Very short session, long but idle sessions Very many session from a single user or host Ineffective application implementations (no bind-variable, no indices) Used with experiment / project responsible for service sizing LCG 3D Status Dirk Duellmann 6

STREAMS Architecture CAPTURE PROCESS LCRs SOURCE QUEUE capture propagation apply capture changes propagate events SOURCE DATABASE log changes user changes REDO LOG DESTINATION QUEUE LCRs APPLY PROCESS apply changes TARGET DATABASE (replica) LCG 3D Status23 February 2006 Dirk Duellmann3D Meeting --- Luis Ramos 7

STREAMS monitoring Home-made scripts capture, propagation and apply status queues status processes statistics STRMMON: Oracle Streams monitor tool overview of the Streams activity Health Check report information on the setup and operation of Streams LCG 3D Status23 February 2006 Dirk Duellmann3D Meeting --- Luis Ramos 8

STREAMS monitoring Display general information about each capture process --------------------------------------------------------------------------- Redo Total Capture Serial Entries LCRs Name Number ID Number State Scanned Enqueued ------------------------------------ ------------- ------ ----------- ----------------------------------- ----------------- ---------------- STRMADMIN_CAPTURE C001 136 7 CAPTURING CHANGES 13394731 705854 STREAMS Monitor, v 2.2 Copyright Oracle Corp. 2002, 2005. Interval = 3, Count=1000 Logon= @ ORACLE 10.2.0.2.0 Streams Pool Size = 752M LOG : <redo generated per sec> NET: <client bytes per sec> <dblink bytes per sec> Cxxx: <lcrs captured per sec> <lcrs enqueued per sec> <capture latency> MEM : <percent of memory used> % <streams pool size> PRxx: <messages received per sec> Qx : <msgs enqueued per sec> <msgs spilled per sec> PSxx: <lcrs propagated per sec> <bytes propaged per sec> Axxx: <lcrs applied per sec> <txns applied per sec> <dequeue latency> <F>: flow control in effect <B>: potential bottleneck <x%i x%f x%xx>: <idle wait events percentage> <flow control wait events percentage> <other wait event percentage and name> xx->: database instance name 2006-06-6 16:25:26 d3r1-> MEM 6 % 752M 2006-06-6 16:25:26 d3r1-> LOG 512 NET 6K 0 <B> C001 0 0 3sec <0%I 0%F -> Q46190 0 0 PS01 0 0 0 <89%I 0%F -> PS02 0 0 0 <0%I 0%F -> MEM 6 % 752M 2006-06-6 16:25:29 d3r1-> LOG 0 NET 6K 0 <F> C001 0 0 3sec <0%I 0%F -> Q46190 0 0 PS01 0 0 0 <100%I 0%F -> PS02 0 0 0 <0%I 0%F -> MEM 6 % 752M LCG 3D Status23 February 2006 Dirk Duellmann3D Meeting --- Luis Ramos 9

Detailed Streams Status Developing a detailed stream status display together with ARDA/EIS Filling a gap in Oracle Enterprise Manager Plots for replication load (at the source DB) and individual site throughput and backlog Exercised as part of the replication throughput test with the Tier 1 sites Aim is to provide a few metrics to existing monitoring setups (eg experiment and grid dashboards) LCG 3D Status Dirk Duellmann 10

Frontier Production configuration LCG 3D Status23 February 2006 Dirk Duellmann3D Meeting --- Luis Ramos 11

FroNTier/Squid Monitoring SNMP based setup has been developed by FNAL probes Squid cache availability and usage was used during the CMS FroNTier tests For current LCG 3D pre-production phase this setup has been copied Currently run by CMS and hosted at FNAL May need to re-discuss if more experiments will require FroNTier Being integrated into CMS dashboard by ARDA/EIS team LCG 3D Status Dirk Duellmann 12

Monitoring Squids W/ SNMP interface (MRTG plots shown) Squid@CERN Requests/fetches Squid@CIEMAT Requests/fetches Squid@CERN In/out Squid@CIEMAT In/out Test: 20 Parallel CMSSW Clients @ CIEMAT LCG 3D Status Dirk Duellmann 13

Client Side Monitoring LCG Persistency Framework builds on top of general database abstraction layer CORAL POOL and COOL use this layer for all database access Oracle, MySQL, SQLight and FroNTier are back-end plugins CORAL implements detailed monitoring of perceived latency of connection and query timing Information would be useful to complement server side Discussed interface to Mona Lisa with CMS So far no collection/aggregation of this information due to lack of resources LCG 3D Status Dirk Duellmann 14

Object Diagram Application IRelationaService IRelationaSession DB Reporter IMonitoringService CSV Reporter XML Reporter IMonitoring FrontierAccess CORAL PLUG-IN MySQLAccess CORAL PLUG-IN OracleAccess CORAL PLUG-IN LCG 3D Status Dirk Duellmann 15

Summary Database services are well established at Tier 0 and being setup and tested at Tier 1s Database monitoring has to cover very different granularity (Very) Detailed monitoring available at Tier 0 and soon all Tier 1s with Oracle Enterprise Manager Working on providing relevant summaries/aggregated results for existing experiments and grid deployment Some iteration on selection the right subset of metrics will be required as the production use by the experiments is ramping up Higher level site validation tests from grid jobs will need to be added as soon as experiment test start Plan to align with the existing infrastructure on the deployment side for these tests LCG 3D Status Dirk Duellmann 16