ECMWF Web re-engineering project

Similar documents
eccharts Cihan Sahin Slide 1 Meteorological Operational Systems November

eccharts and Metview 4 2 new visualisation systems at ECMWF

Integrating OGC web services into Metview and Magics++

Web Services at ECMWF

The challenges of the ECMWF graphics packages

Metview 4 ECMWF s latest generation meteorological workstation

Metview 4 ECMWF s latest generation meteorological workstation

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

From Synergie and Oppidum to Synopsis

Welcome to the New Era of Cloud Computing

Streamlining CASTOR to manage the LHC data torrent

Approaches to I/O Scalability Challenges in the ECMWF Forecasting System

Meeting the challenges of the next generation of user interfaces

Rule 14 Use Databases Appropriately

ECMWF point database: providing direct access to any model output grid-point values

Introduction to Metview

Clustering Lecture 8: MapReduce

Your cloud solution for EO Data access and processing

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

Metview and Python - what they can do for each other

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

Metview 5.0 and Beyond, to its Pythonic Future

WHITE PAPER AGILOFT SCALABILITY AND REDUNDANCY

DiskReduce: Making Room for More Data on DISCs. Wittawat Tantisiriroj

Metview Introduction

Introduction to data centers

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

Big Data and Object Storage

C3S Data Portal: Setting the scene

Distributed Systems CS6421

Big Data and Cloud Computing

Rethink Storage: The Next Generation Of Scale- Out NAS

MySQL and Virtualization Guide

Cost-Effective Virtual Petabytes Storage Pools using MARS. FrOSCon 2017 Presentation by Thomas Schöbel-Theuer

Xen and CloudStack. Ewan Mellor. Director, Engineering, Open-source Cloud Platforms Citrix Systems

Large Scale Sky Computing Applications with Nimbus

what is cloud computing?

<Insert Picture Here> MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

Efficient On-Demand Operations in Distributed Infrastructures

The amount of data increases every day Some numbers ( 2012):

2/26/2017. The amount of data increases every day Some numbers ( 2012):

Europeana DSI 2 Access to Digital Resources of European Heritage

Chapter 5. The MapReduce Programming Model and Implementation

EsgynDB Enterprise 2.0 Platform Reference Architecture

Gladinet Cloud Enterprise

Big changes coming to ECMWF Product Generation system

Automated Control for Elastic Storage

Reliable Distributed Messaging with HornetQ

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark

SCRIPT: An Architecture for IPFIX Data Distribution

Next-Generation Cloud Platform

BeBanjo Infrastructure and Security Overview

Basics of Cloud Computing Lecture 2. Cloud Providers. Satish Srirama

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Technical Architecture & Analysis

Distributed Systems COMP 212. Lecture 18 Othon Michail

The EU-funded BRIDGE project

Generating SVG weather maps and meteorological graphs using Magics++

Performance Enhancement of Data Processing using Multiple Intelligent Cache in Hadoop

A brief history on Hadoop

GRNET Cloud Services

Hâpy. Daniel Dehennin. Pôle de Compétences Logiciels Libres. OpenNebula TechDay Paris cc by-nc-sa 2.0-fr

Accelerating Parallel Analysis of Scientific Simulation Data via Zazen

A web portal to analyze and distribute cosmology data

PROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc.

Virtualization Overview. Joel Jaeggli AFNOG SS-E 2013

Adobe Acrobat Connect Pro 7.5 and VMware ESX Server

Gladius Video Management Software

SCS Distributed File System Service Proposal

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?

Maintaining End-to-End Service Levels for VMware Virtual Machines Using VMware DRS and EMC Navisphere QoS

Hewlett Packard Enterprise HPE GEN10 PERSISTENT MEMORY PERFORMANCE THROUGH PERSISTENCE

Developing with Google App Engine

Metview 4 ECMWF s next generation meteorological workstation

How it can help your organisation

CS / Cloud Computing. Recitation 3 September 9 th & 11 th, 2014

SCALABLE WEB PROGRAMMING. CS193S - Jan Jannink - 2/02/10

Data Storage Infrastructure at Facebook

IBM Compose Managed Platform for Multiple Open Source Databases

BeoLink.org. Design and build an inexpensive DFS. Fabrizio Manfredi Furuholmen. FrOSCon August 2008

The TIBCO Insight Platform 1. Data on Fire 2. Data to Action. Michael O Connell Catalina Herrera Peter Shaw September 7, 2016

Discovering Dependencies between Virtual Machines Using CPU Utilization. Renuka Apte, Liting Hu, Karsten Schwan, Arpan Ghosh

Analytics Platform for ATLAS Computing Services

BIG DATA TESTING: A UNIFIED VIEW

Scalable Web Programming. CS193S - Jan Jannink - 2/25/10

The Hadoop Distributed File System Konstantin Shvachko Hairong Kuang Sanjay Radia Robert Chansler

Map Reduce Group Meeting

Integrate MATLAB Analytics into Enterprise Applications

SQL Server 2005 on a Dell Scalable Enterprise Foundation

Quobyte The Data Center File System QUOBYTE INC.

DiskReduce: Making Room for More Data on DISCs. Wittawat Tantisiriroj

Cisco Tetration Analytics

Crossing the Chasm: Sneaking a parallel file system into Hadoop

WHITEPAPER. MemSQL Enterprise Feature List

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung

Oracle for administrative, technical and Tier-0 mass storage services

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

DIVING IN: INSIDE THE DATA CENTER

IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage

Transcription:

ECMWF Web re-engineering project Baudouin Raoult Peter Bispham, Andy Brady, Ricardo Correa, Sylvie Lamy-Thepaut, Tim Orford, David Richardson, Cihan Sahin, Stephan Siemen, Slide 1 Carlos Valiente, Daniel Varela Slide 1

The web re-engineering project Motivation: - Many of our users rely on our graphical web products for their daily work in their forecast offices, and have requested that our web services be continuously available - At the Annual Users meetings, we have received requests to create tailored products (e.g. control the event threshold on probability maps) Goals: - Redesign the web infrastructure so that the web service is highly available and supported at the same level as the field dissemination - Provide more interactivity (e.g. zoom, pan, overlay parameters) - Allow product customisation (e.g. control the event threshold on probability maps) - Use open (OGC) standards so that ECMWF products can be embedded in Slide 2 users own software Slide 2

The web re-engineering project (cont.) 2-year project to implement new ECMWF web service that is: - Highly available and operationally supported (same support as current dissemination) - Aimed at forecasters - Highly interactive - Suitable for deployment as computer-to-computer standard web services - Flexible to meet future requirements Milestones - First prototype - November 2009 - Alpha release - February 2010 - Beta release - January 2011 - Operational release - June 2011 Slide 3 Slide 3

Requirements Highly available Operationally supported - H/A Hardware - H/A Software - Operator monitoring Performance - Target: deliver a plot under 1 second Interactivity - Pan, zoom, overlay (à la GoogleMap) - Customisation, plots on demands (e.g. changing event probability threshold) Scalability - Support any future user load Slide 4 - Extensible: easy addition of new products Slide 4

Gathering of user requirements The project has been presented on several occasions: - ECMWF Forecast Products Users Meeting, Computer Representatives Meeting - Very positive feedback from forecasters - Most forecaster requests focused on the desire to be able to create customised products - Requests for new products Consultation process will continue throughout the project Slide 5 Slide 5

Service Oriented Architecture Slide 6 Slide 6

Hardware 2 Foundry Load Balancer ServerIronGT 3 servers hosting web servers 3 servers hosting web application 3 servers hosting several virtual machines 6 servers hosting storage, compute and plot services HP DL360 G5 Dual 2.5Hz Quad Core Xeon OpenSuSE Linux 11.1 Systems located in different parts of the building, attached to different routers and different power sources Slide 7 Slide 7

Software We investigated technologies used by the big players (e.g Google, Yahoo, Amazon, Facebook, Wikipedia ): - Memcached (Very fast distributed memory) - Tokyo Tyrant (Scalable, distributed persistent space) - Hadoop (High availability and redundant distributed data) - Xen ( Virtualisation) - DRDB (Network RAID - Ganeti (H/A Cluster management) - Nagios (Alerts system) - Scribe (Distributed logging) Slide 8 Slide 8

Software (cont.) - Ganglia (Distributed monitoring) - Django (Python based Web framework, server side) - jquery (JavaScript based web framework, client side) - OpenLayers (JavaScript based OGC WMS-client) - Apache 2.2 (Web server) - MySQL (Database) And of course: - Magics++ - grib_api - Mars - Metview Slide 9 Slide 9

About Hadoop A framework that supports data intensive distributed applications Inspired by Google's MapReduce and Google File System (GFS) white papers. Yahoo, Amazon, IBM, Facebook, AOL, Fox, Last.fm, Microsoft, Hadoop HDFS - Distributed storage, with a filesystem like API (HDFS) - Data nodes hold blocks of data. Each node uses local storage - Name node holds the file names and the blocks location (single point of failure) - Each file is spread of several data nodes - Each block has several copies distributed over the cluster - Designed for large blocks (64 MB) Slide 10 MapReduce facility to be investigated Slide 10

About Ganeti (H/A Pairs) Ganeti is a cluster virtual server management software tool built on top of existing virtualization technologies (Google) Xen virtual machines (Hardware assisted virtualization: 3% overhead) DRBD (Distributed Replicated Block Device) - Network RAID1 (20% overhead write, 0% read) Live migration - Two passes memory migration: 10s for 12GB memory (Stoppage of around 60~300 ms is required to perform final synchronization) - No interruption of service: IP connections not broken (MAC address move) - Fail over : restart VM on backup machine. Slide 11 - Command line tools: can be done by operators Slide 11

Service Oriented Architecture Multi-tier architecture, deployed on a series of Linux clusters: - Web frontend (Web server) - Web backend (Dynamic page generation) - Services (Plotting, probability computations, EPSgrams, ) - Data layer (Raw fields) Cluster approach provides built-in scalability, redundancy and load balancing Critical components run on virtual machines that can be redeployed dynamically Slide 12 Slide 12

Deployment Virtual machines for critical components and single points of failure - Hadoop name node - SOA Broker - Spot database - Catalogue (MySQL) - All virtual machines sized in such a way that they can fit in a smaller number of nodes if necessary Physical machines for components with built-in redundancy - Hadoop data nodes - Memcached servers Slide 13 - Services (plot, retrieve, probabilities, epsgrams, ) Slide 13

Developing in an SOA environment (is hard) Distributed design Troubleshooting Diagnostics tools Instrumentation Regression tests Slide 14 Slide 14

Web user interface Involvement of external design companies Focus on usability Slide 15 Slide 15

Prototype: Forecasting tool Interactivity: zooming, panning, Customisation: - Probabilities threshold, - Show/hide, add/remove layers Related products: Epsgrams Slide 16 Slide 16

Slide 17 Slide 17

Slide 18 Slide 18

Slide 19 Slide 19

Slide 20 Slide 20

Slide 21 Slide 21

Slide 22 Slide 22

Slide 23 Slide 23

Slide 24 Slide 24

Slide 25 Slide 25

Slide 26 Slide 26

Slide 27 Slide 27

Slide 28 Slide 28

Slide 29 Slide 29

Prototype: Catalogue browsing Browsable catalogue Link to Forecaster tool Limited interactivity Preset number of Slide projections, 30 animation Similar to current web catalogue, but use the WREP infrastructure Slide 30

Prototype: OpenLayers integration Alternate interface under investigation Overlay layers, addition of external data sources On top of WREP infrastructure: tiles are Slide created 31 on-demand Slide 31

Prototype: OGC Web Map Services Aim: to make it possible to embed ECMWF products directly in the forecasters workstations On top of WREP infrastructure: - GetCapabilities document build dynamically from product catalogue content - Layers are created on-demand Challenges: access control, time dimension, customisation Slide 32 Slide 32

Conclusion: fully functional proof of concept All products created on-demand (2D maps, EPSgrams) Zoom, pan, overlay Customisation: setting of probability thresholds, contouring Browsable catalogue Initial user interface OGC Web Map Service (WMS) Slide 33 Slide 33

Future work Persistence Security and access control Monitoring, alerts and service statistics Management tools Performance tuning Develop further WMS aspect More products User testing Slide 34 Slide 34

Thank you Come to see our demo at the exhibition Meeting Room 1 Slide 35 Slide 35