for the Physics Community

Similar documents
Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW

CERN IT-DB Services: Deployment, Status and Outlook. Luca Canali, CERN Gaia DB Workshop, Versoix, March 15 th, 2011

The Oracle Database Appliance I/O and Performance Architecture

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

CERN openlab II. CERN openlab and. Sverre Jarp CERN openlab CTO 16 September 2008

Database Level 100. Rohit Rahi November Copyright 2018, Oracle and/or its affiliates. All rights reserved.

Oracle Database 10G. Lindsey M. Pickle, Jr. Senior Solution Specialist Database Technologies Oracle Corporation

Storage Optimization with Oracle Database 11g

Oracle at CERN CERN openlab summer students programme 2011

B. Using Data Guard Physical Standby to migrate from an 11.1 database to Exadata is beneficial because it allows you to adopt HCC during migration.

Ideas for evolution of replication CERN

ORACLE 11gR2 DBA. by Mr. Akal Singh ( Oracle Certified Master ) COURSE CONTENT. INTRODUCTION to ORACLE

Software Defined Storage at the Speed of Flash. PRESENTATION TITLE GOES HERE Carlos Carrero Rajagopal Vaideeswaran Symantec

Overview. About CERN 2 / 11

<Insert Picture Here> Exadata MAA Best Practices Series Session 1: E-Business Suite on Exadata

Oracle Exadata X7. Uwe Kirchhoff Oracle ACS - Delivery Senior Principal Service Delivery Engineer

The storage challenges of virtualized environments

The LHC Computing Grid

Storage and I/O requirements of the LHC experiments

Under the Hood of Oracle Database Appliance. Alex Gorbachev

Storage Designed to Support an Oracle Database. White Paper

The Right Choice for DR: Data Guard, Stretch Clusters, or Remote Mirroring. Ashish Ray Group Product Manager Oracle Corporation

Exam 1Z0-061 Oracle Database 12c: SQL Fundamentals

Andy Mendelsohn, Oracle Corporation

The World s Fastest Backup Systems

Oracle 10g on Solaris to Oracle RAC 11gR2 on Linux Upgrade

IBM Storwize V7000 Unified

Enterprise Manager: Scalable Oracle Management

ORACLE RAC DBA COURSE CONTENT

Zero Data Loss Recovery Appliance DOAG Konferenz 2014, Nürnberg

Oracle 11g: RAC and Grid Infrastructure Administration Accelerated Release 2

Mike Hughes Allstate Oracle Tech Lead, Oracle Performance DBA

Oracle Database 10g: New Features for Administrators Release 2

Hitachi Adaptable Modular Storage and Hitachi Workgroup Modular Storage

<Insert Picture Here> Oracle MAA und RAC Best Practices und Engineered Systems

Maximum Availability Architecture: Overview. An Oracle White Paper July 2002

Cross-Platform Database Migration with Oracle Data Guard

FlashGrid Software Enables Converged and Hyper-Converged Appliances for Oracle* RAC

Oracle 11g: RAC and Grid Infrastructure Administration Accelerated Release 2

EMC DATA DOMAIN OPERATING SYSTEM

Enterprise Backup Architecture. Richard McClain Senior Oracle DBA

<Insert Picture Here> Enterprise Data Management using Grid Technology

Hitachi Adaptable Modular Storage and Workgroup Modular Storage

Oracle Database 11g: SQL Fundamentals I

PASS4TEST. IT Certification Guaranteed, The Easy Way! We offer free update service for one year

Oracle Database 10g : Administration Workshop II (Release 2) Course 36 Contact Hours

Backup and archiving need not to create headaches new pain relievers are around

Oracle Database 11g for Experienced 9i Database Administrators

Oracle Database 11g Administration Workshop II

Oracle DBA workshop I

Database Tables to Storage Bits: Data Protection Best Practices for Oracle Database

EMC DATA DOMAIN PRODUCT OvERvIEW

5 Fundamental Strategies for Building a Data-centered Data Center

Veritas NetBackup on Cisco UCS S3260 Storage Server

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Oracle Database 12c: Clusterware & RAC Admin Accelerated Ed 1

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Experience the GRID Today with Oracle9i RAC

White Paper. A System for Archiving, Recovery, and Storage Optimization. Mimosa NearPoint for Microsoft

VMware Virtual SAN Technology

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

EMC Unified Storage for Oracle Database 11g/10g Virtualized Solution. Enabled by EMC Celerra and Linux using FCP and NFS. Reference Architecture

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

Oracle Database 18c and Autonomous Database

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Exadata Database Machine Backup and Restore Configuration and Operational Best Practices O R A C L E W H I T E P A P E R J U L Y

Oracle Exadata: Strategy and Roadmap

Oracle DBA Course Content

Oracle Database 12C: Advanced Administration - 1Z0-063

The Fastest and Most Cost-Effective Backup for Oracle Database: What s New in Oracle Secure Backup 10.2

Oracle for administrative, technical and Tier-0 mass storage services

Oracle Database 11g: Administration Workshop I DBA Release 2

Data Protection for Cisco HyperFlex with Veeam Availability Suite. Solution Overview Cisco Public

Cisco HyperFlex All-Flash Systems for Oracle Real Application Clusters Reference Architecture

VEXATA FOR ORACLE. Digital Business Demands Performance and Scale. Solution Brief

Oracle Database 11g: Real Application Testing & Manageability Overview

Edge for All Business

<Insert Picture Here> South Fla Oracle Users Group Oracle/Sun Exadata Database Machine June 3, 2010

UNIVERSITY AUTHORISED EDUCATION PARTNER (WDP)

1. Name of Course: Oracle Database 12c: Backup and Recovery Workshop

Explore the Oracle 10g database architecture. Install software with the Oracle Universal Installer (OUI)

Oracle EXAM - 1Z Oracle Exadata Database Machine Administration, Software Release 11.x Exam. Buy Full Product

EMC Business Continuity for Microsoft Applications

Oracle #1 for Data Warehousing. Data Warehouses Growing Rapidly Tripling In Size Every Two Years

Dell Reference Configuration for Large Oracle Database Deployments on Dell EqualLogic Storage

THE SUMMARY. CLUSTER SERIES - pg. 3. ULTRA SERIES - pg. 5. EXTREME SERIES - pg. 9

Virtualizing a Batch. University Grid Center

Dell Storage Center 6.6 SCv2000 SAS Front-end Arrays and 2,500 Mailbox Exchange 2013 Resiliency Storage Solution

ACCELERATE YOUR ANALYTICS GAME WITH ORACLE SOLUTIONS ON PURE STORAGE

Balakrishnan Nair. Senior Technology Consultant Back Up & Recovery Systems South Gulf. Copyright 2011 EMC Corporation. All rights reserved.

Virtualizing Oracle on VMware

Planning & Installing a RAC Database

FlexPod. The Journey to the Cloud. Technical Presentation. Presented Jointly by NetApp and Cisco

Oracle Database 12c: Backup and Recovery Workshop Ed 2 NEW

Monitor the Oracle Database Fast Recovery Area

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into

1Z Upgrade to Oracle Database 12cm Exam Summary Syllabus Questions

CouchDB-based system for data management in a Grid environment Implementation and Experience

<Insert Picture Here> Maximum Availability Architecture (MAA) Best Practices: Online Patching and Rolling Upgrades with Oracle Database

Oracle Zero Data Loss Recovery Appliance

Transcription:

Databases Services at CERN for the Physics Community, CERN Orcan Conference, Stockholm, May 2010

Outline Overview of CERN and computing for LHC Database services at CERN DB service architecture DB service operations and monitoring Service evolution 2

What is CERN? CERN is the world's largest particle physics centre Particle physics is about: - elementary particles and fundamental forces Particles physics requires special tools to create and study new particles ACCELERATORS, huge machines able to speed up particles to very high energies before colliding them into other particles DETECTORS, massive instruments which register the particles produced when the accelerated particles collide CERN is: -~ 2500 staff scientists (physicists, engineers, ) - Some 6500 visiting scientists (half of the world's particle physicists) They come from 500 universities representing 80 nationalities. 3

LHC: a Very Large Scientific Instrument LHC : 27 km long 100m underground Mont Blanc, 4810 m ATLAS Downtown Geneva CMS +TOTEM ALICE 4

Based on Advanced Technology 27 km of superconducting magnets cooled in superfluid helium at 1.9 K 5

The ATLAS experiment 7000 tons, 150 million sensors generating data 40 millions times per second i.e. a petabyte/s 6

7 TeV Physics with LHC in 2010 7

The LHC Computing Grid 8

A collision at LHC 9

The Data Acquisition 10 Luca Canali

Tier 0 at CERN: Acquisition, First pass processing Storage & Distribution 1.25 GB/sec (ions) 11 Luca Canali

The LHC Computing Challenge Signal/Noise: 10 9 Data volume High rate * large number of channels * 4 experiments 15 PetaBytes of new data each year Compute power Event complexity * Nb. events * thousands users 100 k of (today's) fastest CPUs 45 PB of disk storage Worldwide analysis & funding Computing funding locally in major regions & countries Efficient analysis everywhere GRID technology Bulk of data stored in files, a fraction of it in databases (~30TB/year) 12

LHC data Balloon (30 Km) CD stack with 1 year LHC data! (~ 20 Km) LHC data correspond to about 20 million CDs each year! Where will the experiments store all of these data? Concorde (15 Km) Mt. Blanc (4.8 Km) 13

Tier 0 Tier 1 Tier 2 Tier-0 (CERN): Data recording Initial data reconstruction Data distribution Tier-1 (11 centres): Permanent storage Re-processing Analysis Tier-2 (~130 centres): Simulation End-user analysis 14

Databases and LHC Relational DBs play today a key role in the LHC production chains online acquisition, offline production, data (re)processing, data distribution, analysis SCADA, conditions, geometry, alignment, calibration, file bookkeeping, file transfers, etc.. Grid Infrastructure and Operation services Monitoring, Dashboards, User-role management,.. Data Management Services File catalogues, file transfers and storage management, Metadata and transaction processing for custom tape storage system of physics data Accelerator logging and monitoring systems 15

DB Services and Architecture

CERN Databases in Numbers CERN databases services global numbers Global users community of several thousand users ~ 100 Oracle RAC database clusters (2 6 nodes) Currently over 3300 disk spindles providing more than 1PB raw disk space (NAS and SAN) Some notable DBs atcern Experiment databases 13 production databases Currently between 1 and 9 TB in size Expected growth between 1 and 19 TB / year LHC accelerator logging database (ACCLOG) ~30 TB Expected growth up to 30 TB / year... Several more DBs on the range 1-2 TB 17

Service Key Requirements Data Availability, Scalability, Performance and Manageability Oracle RAC on Linux: building-block architecture for CERN and Tier1 sites Data Distribution Oracle Streams: for sharing information between databases at CERN and 10 Tier1 sites Data Protection Oracle RMAN on TSM for backups Oracle Data Guard: for additional protection against failures (data corruption, disaster recoveries,...) 18

Hardware architecture Servers Commodity hardware (Intel Harpertown and Nahalem based mid-range servers) running 64-bit Linux Rack mounted boxes and blade servers Storage Different storage types used: NAS (Network-attached Storage) 1Gb Ethernet SAN (Storage Area Network) 4Gb FC Different disk drive types: high capacity SATA (up to 2TB) high performance SATA high performance FC 19

High Availability Resiliency from HW failures Using commodity HW Redundancies with software Intra-node redundancy Redundant d IP network paths (Linux bonding) Redundant Fiber Channel paths to storage OS configuration with Linux s device mapper Cluster redundancy: Oracle RAC + ASM Monitoring: custom monitoring and alarms to on-call DBAs Service Continuity: Physical Standby (Dataguard) Recovery operations: on-disk backup and tape backup 20

DB clusters with RAC Applications are consolidated on large clusters per customer (e.g. experiment) Load balancing and growth:leverages Oracle services HA: cluster survives node failures Maintenance: allows scheduled rolling interventions Prodsys COOL Shared_2 Shared_1 Integration ti TAGS listener DB inst. listener listener listener DB inst. DB inst. DB inst. ASM inst. ASM inst. ASM inst. ASM inst. Clusterware 21

Oracle s ASM ASM (Automatic Storage Management) Cost: Oracle s cluster file system and volume manager for Oracle databases HA: online storage reorganization/addition Performance: stripe and mirroring everything Commodity HW: Physics DBs at CERN use ASM normal redundancy (similar to RAID 1+0 across multiple disks and storage arrays) DATA Disk Group RECOVERY Disk Group Storage 1 Storage 2 Storage 3 Storage 4 22

Storage deployment Two diskgroups created for each cluster DATA data files and online redo logs outer part of the disks RECO flash recovery area destination archived redo logs and on disk backups inner part of the disks One failgroup per storage array DATA _ DG1 RECO_DG1 Failgroup1 Failgroup2 Failgroup3 Failgroup4 23

Physics DB HW, a typical setup Dual-CPU quad-core 2950 DELL servers, 16GB memory, Intel 5400-series Harpertown ; 2.33GHz clock Dual power supplies, mirrored local disks, 4 NIC (2 private/ 2 public), dual HBAs, RAID 1+0 like with ASM 24

ASM scalability test results Big Oracle 10g RAC cluster built with mid-range 14 servers 26 storage arrays connected to all servers and big ASM diskgroup created (>150TB of raw storage) Data warehouse like workload (parallelized query on all test servers) Measured sequential I/O Read: 6 GB/s Read-Write: 3+3 GB/s Measured 8 KB random I/O Read: 40 000 IOPS Results commodity hardware can scale on Oracle RAC 25

Tape backups Main safety net against failures Despite the associated cost they have many advantages: Tapes can be easily taken offsite Backups once properly stored on tapes are quite reliable If configured properly can be very fast Metadata MM MM Client Client RMAN RMAN Library RMAN Library Payload Database Media Manager Server Tape drives 26

Oracle backups Oracle RMAN (Recovery Manager) Integrated backup and recovery solution Backups to tape (over LAN) The fundamental way of protecting databases against failures Downside takes days to backup/restore multi TB databases Backups to disk (RMAN) Daily updates of the copy using incremental backups On disk copy kept at least one day behind - can be used to address logical corruptions Very fast recovery when primary storage is corrupted Switch to image copy or recover from copy Note: this is a cheap alternative/complement to a standby DB 27

Tape B&R strategy Incremental backup strategy example: Full backups every two weeks backup force tag full_backup_tag incremental level 0 check logical database plus archivelog; Incremental cumulative every 3 days backup force tag incr_backup_tag' incremental level 1 cumulative for recover of tag last_full_backup_tag' database plus archivelog; Daily incremental differential backups backup force tag incr_backup_tag' incremental level 1 for recover of tag last_full_backup_tag' database plus archivelog; Hourly archivelog backups backup tag archivelog_backup_tag' archivelog all; Monthly automatic test restore 28

Backup & Recovery On-tape backups: fundamental for protecting data, but recoveries run at ~100MB/s (~30 hours to restore datafiles of a DB of 10TB) Very painful for an experiment in data-taking Put in place on-disk image copies of the DBs: able to recover to any point in time of the last 48 hours activities Recovery time independent of DB size 29

CERN implementation of MAA Users and applications WAN/Intranet t Primary RAC database RMAN Physical Standby RAC database 30

Dataguard Service Continuity Based on proven physical standby technology Protects from corruption of critical production DBs (disaster recovery) Standby DB apply delayed 24h (protection from logical corruption) Other uses of standby DBs Standby DBs can be temporarily activated t for testing ti Oracle flashback allows simple re-instantiation of standby after test Standby DB copies used to minimize time for major changes Standby allows to create and keep up-to-date a mirror copy of production HW migrations Physical standby provides a fall-back solution after migration Release upgrade Physical standby broken after intervention 31

Software Technologies replication Oracle Streams data replication technology CERN -> CERN replication i Provides production systems isolation CERN -> Tier1s replication Enables data processing in Worldwide LHC Computing Grid 32

Downstream Capture Downstream capture to de-couple Tier 0 production databases from destination or network problems source database availability is highest priority Optimizing redo log retention on downstream database to allow for sufficient re-synchronisation window we use 5 days retention to avoid tape access Dump fresh copy of dictionary to redo periodically 10.2 Streams recommendations (metalink note 418755) Source Database Downstream Database Target Database Propagate Redo Logs Redo Transport method Capture Appl y 33 33

Monitoring and Operations

Application Deployment Policy Policies for hardware, DB versions, applications testing Application release cycle Development service Validation service Production service Database software release cycle Production service version n Validation service Version n+1 Production service Version n+1 35

Patching and Upgrades Databases are used by a world-wide community: arranging for scheduled interventions (s/w and h/w upgrades) requires quite some effort Services need to be operational 24x7 Minimize i i service downtime with rolling upgrades and use of stand-by databases 0.04% 04% services unavailability = 3.5 hours/year 0.12% server unavailability = 9.5 hours/year (Patch deployment, hardware) 36

DB Services Monitoring Grid control extensively used for performance tuning By DBAs and application power users Custom applications Measure of service availability Integrated to email and SMS to on-call Streams monitoring Backup job scheduling and monitoring ASM and storage failures monitoring Other ad-hoc alarms created and activated when needed For example if a repeated bug hits production and need several parameters need to be checked as a work-around Weekly report on the performance and capacity used in production DB is sent to application owners 37

Oracle EM and Performance Troubleshooting oot Our experience: simplify tasks and leads to correct methodology for most tuning tasks: 38

3D Streams Monitor 39

AWR repository for capacity planning pa We keep a repository from AWR of the metrics of interest (IOPS, CPU, etc) 40

Storage monitoring ASM instance level monitoring Storage level monitoring new failing disk on RSTOR614 new disk installed on RSTOR903 slot 2 41

Security Schemas setup with least required privileges account owner only used for application upgrades reader and writer accounts used by applications password verification function to enforce strong passwords Firewall to filter DB connectivity it CERN firewall and local iptables firewall Oracle CPU patches, more recently PSUs Production up-to-date after validation period Policy agreed with users Custom development Audit-based log analysis and alarms Automatic pass cracker to check password weakness 42

DBAs and Data Service Management CERN DBAs Activities and responsibilities cover a broad range of the technology stack Comes natural with Oracle RAC and ASM on Linux In particular leveraging on lower complexity of commodity HW Most important part of the job still interaction with the customers Know your data and applications! Advantage: DBAs can have a full view of DB service from application to servers 43

Evolution of the Services and Lessons Learned

Upgrade to 11gR2 Next big change to our services Currently waiting for first patchset to open development and validation cycle Production upgrades to be scheduled with customers Many new features of high interest Some already present in11gr1 Active Dataguard Streams performance improvements ASM manageability improvements for normal redundancy Advanced compression 45

Active Dataguard Oracle standby databases can be used for read-only operations Opens many new architectural options We plan to use active dataguard instead of streams for online to offline replication Offload production DBs for read-only operations Comment: active dataguard and RAC have a considerable overlap when planning a HA configuration We are looking forward to put this in production 46

ASM Improvements in 11gR2 Rebalancing tests showed big performance improvements (a factor four gain) Excessive re-partnering of 10g and 11gR2 fixed Integration of CRS and ASM Simplifies administration Introduction of Exadata which uses ASM in normal redundancy Development benefits standard configs too ACFS (cluster file system based on ASM) performance: faster than ext3, 47

Streams 11gR2 Several key improvements: Throughput and replication performance has improved considerably 10x improvements in our production-like tests Automatic split and merge procedure Compare and Converge procedures 48

Architecture and HW Servers cost/performance keeps improving Multicore CPUs and large amounts of RAM CPU-RAM throughput and scalability also improving Ex: 64 cores and 64 GB of RAM are in the commodity HW price range Storage and interconnect technologies less straightforward in the commodity HW world Topics of interest for us SSDs SAN vs. NAS 10gbps Ethernet, t 8gbps FC 49

Backup challenges Backup/recovery over LAN becoming problem with databases exceeding tens of TB Days required to complete backup or recovery Some storage managers support so-called LAN-free backup Backup data flows to tape drives directly over SAN Media management server used only to register backups Very good performance observed during tests (FC saturation, e.g. 400MB/s) Alternative using 10Gb Ethernet 1GbE Metadata Backup data FC FC Media Manager Server Database Tape drives 50

Data Life Cycle Management Several Physics applications generate very large data sets and have the need to archive data Performance-based: online data more frequently accessed Capacity based: Old data can be read-only, rarely accessed, in some cases can be put online on demand Technologies: Oracle Partitioning: mainly range partitioning by time Application-centric: tables split and metadata maintained by the application Oracle compression Archive DB initiative: offline old partitions/chunks of data in a separate archive DB 51

Conclusions We have set up a world-wide distributed database infrastructure for LHC Computing Grid The enormous challenges of providing robust, flexible and scalable DB services to the LHC experiments have been met using a combination of Oracle technology and operating procedures Notable Oracle technologies: RAC, ASM, Streams, Data Guard Developed in-house relevant monitoring and procedures Going forward Challenge of fast growing DBs Upgrade to 11.2 Leveraging new HW technologies 52

Acknowledgments CERN-IT DB group and in particular: Jacek Wojcieszuk, Dawid Wojcik, Eva Dafonte Perez, Maria Girone More info: http://cern.ch/it-dep/db http://cern.ch/canali h/ 53