Observational DataBase (ODB*) and its usage at ECMWF

Similar documents
Anne Fouilloux. Fig. 1 Use of observational data at ECMWF since CMA file structure.

Using ODB at ECMWF. Piotr Kuchta Sándor Kertész. Development Section ECMWF. Slide 1. MOS Workshop, 2013 November 18-20, ECMWF

Observation feedback archiving in MARS. Acknowledgement:

Metview 4 ECMWF s next generation meteorological workstation

Meteorology and Python

HIRVDA(C2M) HIRVDA(MIN) HIRVDA(MDCO) HIRVDA(M2C)

eccharts and Metview 4 2 new visualisation systems at ECMWF

Report on the COPE technical meeting held at ECMWF, Reading 9-12, June 2014

Introduction to Metview

Metview 4 ECMWF s latest generation meteorological workstation

Metview 4 ECMWF s latest generation meteorological workstation

Metview Introduction

BUFR decoding. Dominique Lucas User Support. February Intro Bufr decoding

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1

Computer Architectures and Aspects of NWP models

TIGGE and the EU Funded BRIDGE project

An Informal Introduction to MemCom

WHAT IS A DATABASE? There are at least six commonly known database types: flat, hierarchical, network, relational, dimensional, and object.

Fundamentals of Information Systems, Seventh Edition

Instituting an observation database (ODB) capability in the GSI

Instituting an observation database

Metview Macro Language

Greenplum Architecture Class Outline

cdo Data Processing (and Production) Luis Kornblueh, Uwe Schulzweida, Deike Kleberg, Thomas Jahns, Irina Fast

Unified Model Performance on the NEC SX-6

Metview BUFR Tutorial. Meteorological Visualisation Section Operations Department ECMWF

Bruce Wright, John Ward, Malcolm Field, Met Office, United Kingdom

Data about data is database Select correct option: True False Partially True None of the Above

AAPP status report and preparations for processing METOP data

The Logical Data Store

Chapter 11 Database Concepts

Introduction to ECMWF resources:

NWP SAF. SSMIS UPP Averaging Module. Technical Description NWP SAF. SSMIS UPP Averaging Module Technical Description. Version 1.0

SMD149 - Operating Systems - File systems

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

ALADIN Project Stay report version 1.0 (May 2016)

ECMWF contribution to the SMOS mission

UNIT 4 DATABASE SYSTEM CATALOGUE

Scalable Access to SAS Data Billy Clifford, SAS Institute Inc., Austin, TX

Uniform Resource Locator Wide Area Network World Climate Research Programme Coupled Model Intercomparison

The challenges of the ECMWF graphics packages

A problem of strong scalability?

Database Design. IIO30100 Tietokantojen suunnittelu. Michal Zabovsky. Presentation overview

Compiling environment

OLAP Introduction and Overview

Course Description. Audience. Prerequisites. At Course Completion. : Course 40074A : Microsoft SQL Server 2014 for Oracle DBAs

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores

Memory Management! How the hardware and OS give application pgms:" The illusion of a large contiguous address space" Protection against each other"

Metview s new Python interface

Actian Hybrid Data Conference 2017 London Actian Corporation

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals

DS Introduction to SQL Part 1 Single-Table Queries. By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

DATABASE SYSTEMS CHAPTER 2 DATA MODELS 1 DESIGN IMPLEMENTATION AND MANAGEMENT INTERNATIONAL EDITION ROB CORONEL CROCKETT

The EU-funded BRIDGE project

Example Sheet for Operating Systems I (Part IA)

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM

Index. Bitmap Heap Scan, 156 Bitmap Index Scan, 156. Rahul Batra 2018 R. Batra, SQL Primer,

The Personal Terabyte

IBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM

Metview and Python - what they can do for each other

Parallel & Concurrent Programming: ZPL. Emery Berger CMPSCI 691W Spring 2006 AMHERST. Department of Computer Science UNIVERSITY OF MASSACHUSETTS

epygram Enhanced PYthon for GRaphics and Analysis of Meteorological fields Alexandre Mary 1, Sébastien Riette 2

Testing of the COPE framework for processing observations

Dominique Lucas Xavier Abellan Ecija User Support

The TIDB2 Meteo Experience

Software Infrastructure for Data Assimilation: Object Oriented Prediction System

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

eccodes BUFR encoding

EDMS. Architecture and Concepts

Database Management System 2

Optimizing Performance for Partitioned Mappings

SQL Server 2014 Column Store Indexes. Vivek Sanil Microsoft Sr. Premier Field Engineer

Where are we? Week -4: Data definition (Creation of the schema) Week -3: Data definition (Triggers) Week -1: Transactions and concurrency in ORACLE.

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13

Metview 5.0 and Beyond, to its Pythonic Future

About Speaker. Shrwan Krishna Shrestha. /

SAS Scalable Performance Data Server 4.3

Disks, Memories & Buffer Management

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8

MTA Database Administrator Fundamentals Course

CISQ Weakness Descriptions

What is wrong with PostgreSQL? OR What does Oracle have that PostgreSQL should? Richard Stephan

Interpolation. Computer User Training Course Paul Dando. User Support. ECMWF 25 February 2016

Data Infrastructure IRAP Training 6/27/2016

Basant Group of Institution

ECMWF New Users Metview Tutorial

Passit4sure.P questions

Mahathma Gandhi University

GFS-python: A Simplified GFS Implementation in Python

The functions performed by a typical DBMS are the following:

File Systems. CSE 2431: Introduction to Operating Systems Reading: Chap. 11, , 18.7, [OSC]

CUG Talk. In-situ data analytics for highly scalable cloud modelling on Cray machines. Nick Brown, EPCC

Determining Optimal MPI Process Placement for Large- Scale Meteorology Simulations with SGI MPIplace

Advanced Data Management Technologies Written Exam

IT Best Practices Audit TCS offers a wide range of IT Best Practices Audit content covering 15 subjects and over 2200 topics, including:

Introduction to SQL/PLSQL Accelerated Ed 2

Example Sheet for Operating Systems I (Part IA)

Oracle Database: SQL and PL/SQL Fundamentals Ed 2

Integrating OGC web services into Metview and Magics++

Bottom line: A database is the data stored and a database system is the software that manages the data. COSC Dr.

Transcription:

Observational DataBase (ODB*) and its usage at ECMWF anne.fouilloux@ecmwf.int Slide 1 *ODB has been developed and maintained by Sami Saarinen Slide 1

Outline Observational usage over the past decades at ECMWF Before ODB Observational DataBase (ODB) - What is ODB? - And what is NOT ODB! its current usage in IFS The way forward Slide 2 Slide 2

Observational usage over the past decades One of the major progress made over the last two decades in numerical weather prediction (NWP) can be attributed to the improved utilization of observations. 18 Number of data used per day (millions) 16 14 12 10 8 6 4 2 CONV+SAT WINDS TOTAL 0 1996 1997 1998 1999 2000 2001 2002 2003 date But this has been possible only thanks to Slide the 3usage of supercomputers as well as the development of efficient strategies to read/write/process these observations. 2004 2005 2006 2007 2008 2009 Slide 3

First step toward an efficient strategy Slide 4 Slide 4

CMA (Central Memory Array) file structure Based on encoding all data into IEEE 64 bit floating points. Once read, CMA were kept in memory for a fast data access. DDR 1 DDR 2 Observation report Observation report Observation report Data Description Record (fixed length) Each observation report (variable length) consisted of two parts: Header xxxxxxxxxxxxxxxxx Slide 5 Body yyyyyyyyyyyy yyyyyyyyyyyy yyyyyyyyyyyy yyyyyyyyyyyy yyyyyyyyyyyy Header: identification, position and time Slide 5 coordinates, etc. Body: observation value, pressure levels, channel numbers, etc.

With the introduction of 4D var in IFS and the growing number of satellite observations There was a need for a new approach to store and access observational data Slide 6 Slide 6

ODB (Observational DataBase) Sami Saarinen and al. came up with the idea of using relational database concepts for easier data selection and filtering: the ODB software was born (mid-1998; became operational in 2000). But what is ODB? - An incore database (like CMA) to improve efficiency - A format: inherited from CMA format (hierarchical format) - A hierarchical database with a data definition and query language: ODB/SQL language (subset of ANSI SQL) - A parallel fortran 90 interface to enable MPI-parallel data queries, but also to coordinate queries for data shuffling between MPI-tasks Slide 7 - A set of post-processing tools (odbsql, odbdiff, etc.) Slide 7

But ODB cannot Restrict the user's ability to retrieve, add or modify data by protecting unauthorized access. However, with Fortran90 access layer, an ODB database can be opened in READONLYmode. Share a database by concurrent users without interfering each other. Possible for READONLY-databases. Protect the database from corruption due to inconsistent updates or during system failures. Slide 8 Slide 8

ODB hierarchical data model ECMWF layout desc ddrs timeslot_index index hdr poolmask body errstat sat update1..3 satob atovs ssmi scatt reo3 atovs_pred ssmi_body scatt_body The desc ddrs hdr errstat satob, ssmi_body, table structure table atovs, table holds contains scatt_body contains ssmi, allows descriptive location scatt, Data the repeating observation tables actual Description reo3 and data tables are measurement time information similar error information each contain Records statistics to database body information, satellite using table parent/child every (analysis specific first (inherited date, guess from observation information relationships and analysis analysis CMA time, format) (similar (inherited departure (save etc.) disk to from header values space CMA tables) as and header) well memory as analysis consumption). status flag Slide 9 information. A table can be seen as a matrice (array or so called flat file) with a number of rows and columns containing numerical data. Slide 9

How to describe this hierarchy? Slide 10 Slide 10

ODB/SQL: Data Definition Language (DDL) CREATE TABLE hdr AS ( lat real, lon real, statid string, obstype int, date YYYYMMDD, time HHMMSS, status flags_ t, body @LINK, ); CREATE TABLE body AS ( varno pk5int, press pk9real, obsvalue pk9real, ); lat lon statid obstype date time status -14.78 143.5 ' 94187' 1 20081021 230000 1 A LINK tells how many times a row needs to be repeated (10 times in our example) and which table is involved (body) Slide 11 @LINK standard data type column name or attribute built-in date & time types packed data type composite data type (bit-field) LINK data type varno press obsvalue 1 100350 804.14 30 100100 120 39 99900 277.6 40 100350 292.4 58 100350 0.57 111 100840 260 112 100100 2 41 97670 12.9 42 95310-4.84e-15 80 100880 0 Slide 11

Parallelisation: a requirement for IFS Slide 12 Slide 12

ODB parallel database system Aims to improve performance through parallelization of various operations, such as loading data, building ODBs and evaluating queries. Data is stored in a distributed fashion - divide TABLEs horizontally into pools between processors; pools are assigned to the MPI-tasks in a round-robin fashion. - each table can be assigned to an openmp threads no. of pools "decided" in the Fortran90 layer SELECT data from all or a particular pool only Distribution of data among pools done at the ODB creation Slide 13 Slide 13

Example of data partitioning Pool#1 Table hdr lat lon statid obstype date time status -14.78 143.5 ' 94187' 1 20081021 230000 1 @LINK Table body varno press obsvalue 1 100350 804.14 30 100100 120 39 99900 277.6 Pool#2 lat lon statid obstype date time status -14.78 143.5 ' 94187' 1 20081021 230000 1 @LINK varno press obsvalue 40 100350 292.4 58 100350 0.57 111 100840 260 Pool#3 lat lon statid obstype date time status -14.78 143.5 ' 94187' 1 20081021 230000 1 @LINK varno press obsvalue 112 100100 2 41 97670 12.9 42 95310-4.84e-15 80 100880 0 A single pool forms a sub-database. Slide 14 Slide 14

Parallel I/O strategy I/O tasks To improve performance, only a subset of pools is selected to perform I/O (read/write ODB on disk). Similar tables are then concatenated together. The number of I/O pools is fully configurable Pool#1 Pool#2 Pool#3 Pool#4 assigned to available openmp threads hdr body errstat ssmi ssmi_body hdr body errstat ssmi ssmi_body hdr body errstat ssmi Slide 15 ssmi_body Slide 15 hdr body errstat ssmi ssmi_body

Example of an ODB database on disk > ls ECMA.iasi 1/ 141/ 183/ 218/ 265/ 43/ 85/ ECMA.iomap 107/ 145/ 193/ 225/ 266/ 49/ 97/ ECMA.sch 110/ 15/ 197/ 239/ 267/ 56/ 99/ IOASSIGN@ 113/ 155/ 211/ 241/ 272/ 57/ ECMA.IOASSIGN 121/ 164/ 212/ 25/ 281/ 71/ ECMA.dd 127/ 169/ 217/ 253/ 29/ 73/ ECMA.flags Metadata Pool directories > ls ECMA.iasi/1 atovs ddrs index sat ssmi update_2 atovs_body desc poolmask satob ssmi_body update_3 atovs_pred errstat reo3 scatt Slide 16 timeslot_index body hdr reo3_body scatt_body update_1 Slide 16

Data selection and filtering To read/update your database once it is created Slide 17 Slide 17

ODB/SQL Queries For existing ODBs only... [CREATE VIEW view_name AS] SELECT [DISTINCT] column_ name( s) FROM table( s) [WHERE some_ condition( s)_ to_ be_ met ] [ORDERBY sort_ column_ name( s) [ASC/ DESC] ] ODB/SQL (*) is a small subset of international standard SQL used to manipulate relational databases. It allows to define data queries in order retrieve (in parallel) a subset of data items. This is the main motivation of using ODB?! Except for the creation of a database or within IFS/ARPEGE where a Fortran program is necessary, ODB/SQL can be used in an Slide 18 interactive way via ODB-tools (odbviewer, odbsql, etc.). (*) SQL stands for Structured Query Language Slide 18

ODB/SQL example SELECT fahrenheit(obsvalue), // Convert from Kelvin to F abs(fg_depar an_depar) AS abs_delta FROM hdr, body WHERE obstype = $synop AND varno@body = $t2m AND obsvalue is not NULL ; Slide 19 odbsql -v request.sql -i /home/rd/stf/ecma.conv Slide 19

What about parallel data queries? Slide 20 Slide 20

Fortran 90 interface to ODB/SQL Parallel data queries are possible via the ODB Fortran90 interface layer; The Fortran 90 layer offers a unique user interface to - Open & close database - execute ODB/SQL queries, update & store queried data - Inquire information about database metadata The same code can be used in serial or parallel MPI/OpenMP mode (with any number of processors/openmp threads). SELECT ed data can be asked to be shuffled ( part- exchanged ) or replicated across processors; by default data selection applies to the local pools only. Slide 21 Slide 21

An example of Fortran program with ODB program main After CREATE ODB_select, VIEW sqlview nrows AS is null when an MPI task use odb_module implicit does SELECT not none hold lat, a lon, pool obsvalue, status@body integer(4) FROM :: hdr, h, rc, body nra, nrows, ncols, npools, j, jp real(8), allocatable:: x(:,:) npools= 0 h = ODB_open("ECMA", "OLD", npools=npools) DO jp=1,npools rc= ODB_select(h, "sqlview",nrows,ncols,poolno=jp) allocate(x(nrows,0:ncols)) rc= ODB_get(h, "sqlview",x,nrows,ncols,poolno=jp) call update(x,nrows,ncols)! Not an ODB-routine rc= ODB_put(h, "sqlview",x,nrows,ncols,poolno=jp) deallocate(x) rc= ODB_cancel(h, "sqlview",poolno=jp) ENDDO Slide 22 rc= ODB_close(h, save=.true.) end program main Slide 22

But how does it work in our 4Dvar system? Slide 23 Slide 23

ECMWF usage of ODB We use two main ODBs: - ECMA (Extended CMA): all observations (active/passive/blacklisted) - CCMA (Compressed CMA): active observations after IFS screening No unique centralized ODBs: we create new ODBs for each analysis ECMAs are created from bufr files: - Enables MPI-parallel database creation efficient - Distribution is done in bufr2odb in IFS for ECMA (pools done per obs. group). It is done again when creating CCMA from ECMA i.e. when creating a new database with active data only. ODBs archived in ECFS which is a large distributed storage system Slide 24 Feedback bufr files are created from ODBS at the end of the analysis and archived in MARS our Meteorological Archive. Slide 24

ODB within IFS/4Dvar system Archived in MARS or available on line on our HPCF Post-processing Slide 25 Archived in MARS (ECMWF main repository of meteorological data) Slide 25

Post-processing of ODBs Slide 26 Slide 26

ODB-tools and post-processing applications ODB tools (odbsql,etc.) User applications ODB library Metview obstat Obstat: Metview: odbsql: a a plotting tool to to compute access package ODB and (see plot data Sandor statistics in read/only presentation observations mode. done this assimilated morning odbcompress: in the to ECMWF create or a sub-odbs Meteo France from Slide assimilation an 27 existing suite. database See Mohamed simulobs2odb: presentation to create (given a new Tuesday...) ODB from an ascii file odbmerge: to combine several databases Slide 27

The way forward Slide 28 Slide 28

What next? ODB is now more than a tool dedicated to our 4Dvar system. It is now time to better integrate ODB in our full ECMWF system (from receiving observations to the archiving of feedback information) First step is to archive ODBs in our Meteorological archive (see Peter Kuchta presentation on Friday) More and more interest on ODB from external centres (ODB used by Australian Bureau of Meteorology, Melbourne; triggered some interest by UK Met Office; GMAO, Washington investigates the possibilities of ODB for their own usage, etc.) Make ODB easier to handle by external parties: revisit ECMWF DDL file, create a dictionary of ODB attributes and their usage, Slide 29 improve user interfaces, etc. Slide 29