ARCHITECTURE OF MADIS DATA PROCESSING AND DISTRIBUTION AT FSL

Similar documents
RECENT ADVANCES IN THE FSL CENTRAL FACILITY DATA SYSTEMS. Robert C. Lipschutz* and Christopher H. MacDermaid*

4.7 MULTICAST DATA DISTRIBUTION ON THE AWIPS LOCAL AREA NETWORK

AWIPS Technology Infusion Darien Davis NOAA/OAR Forecast Systems Laboratory Systems Development Division April 12, 2005

Microsoft Office SharePoint Server 2007

8A.4 An Exercise in Extending AWIPS II

AMS Board on Enterprise Communications Improving Weather Forecasts November 29, Data Sources: Pulling it all together

GTRC Hosting Infrastructure Reports

The advantages of architecting an open iscsi SAN

Database Development Update GIS Update Visual Analysis Tools Update New Hardware Update

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

EMC CLARiiON CX3-40. Reference Architecture. Enterprise Solutions for Microsoft Exchange Enabled by MirrorView/S

A High-Performance Storage and Ultra- High-Speed File Transfer Solution for Collaborative Life Sciences Research

Ivanti Service Desk and Asset Manager Technical Specifications and Architecture Guidelines

Data Management Components for a Research Data Archive

Pinnacle 3 SmartEnterprise

Pinnacle3 Professional

From Sensor to Archive: Observational Data Services at NCAR s Earth Observing Laboratory

Amara's law : Overestimating the effects of a technology in the short run and underestimating the effects in the long run

Unidata and data-proximate analysis and visualization in the cloud

RAIDIX Data Storage Solution. Data Storage for a VMware Virtualization Cluster

McIDAS - XCD McIDAS Users Group Meeting

NOAA NextGen IT/Web Services (NGITWS)

Silicon House. Phone: / / / Enquiry: Visit:

The power of centralized computing at your fingertips

Performance Comparisons of Dell PowerEdge Servers with SQL Server 2000 Service Pack 4 Enterprise Product Group (EPG)

NetVault Backup Client and Server Sizing Guide 3.0

Backup and Recovery. Benefits. Introduction. Best-in-class offering. Easy-to-use Backup and Recovery solution

NetVault Backup Client and Server Sizing Guide 2.1

IBM System Storage DS6800

12/04/ Dell Inc. All Rights Reserved. 1

WHITE PAPER: BEST PRACTICES. Sizing and Scalability Recommendations for Symantec Endpoint Protection. Symantec Enterprise Security Solutions Group

IT and Interface Requirements

IBM System Storage DS5020 Express

Use of the Internet SCSI (iscsi) protocol

SPACE WEATHER PRODUCT GENERATION TRANSITION FROM SWPC to NESDIS. Arthur T McClinton Jr* Noblis, Inc., Falls Church, Virginia. 1.

What is QES 2.1? Agenda. Supported Model. Live demo

Microsoft SQL Server 2012 Fast Track Reference Configuration Using PowerEdge R720 and EqualLogic PS6110XV Arrays

Cisco Prime Performance 1.3 Installation Requirements

Table 1 The Elastic Stack use cases Use case Industry or vertical market Operational log analytics: Gain real-time operational insight, reduce Mean Ti

GOES-R Update Greg Mandt System Program Director

Moving Weather Model Ensembles To A Geospatial Database For The Aviation Weather Center

SQL Server 2005 on a Dell Scalable Enterprise Foundation

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research

ISILON X-SERIES. Isilon X210. Isilon X410 ARCHITECTURE SPECIFICATION SHEET Dell Inc. or its subsidiaries.

Comparison: Microsoft Logical Disk Manager (LDM) and VERITAS Volume Manager

Interstage Big Data Complex Event Processing Server V1.0.0

IBM 3850-Mass storage system

Minimum Requirements for Cencon 4 with Microsoft R SQL 2008 R2 Enterprise

INNOVATIVE SD-WAN TECHNOLOGY

BeoLink.org. Design and build an inexpensive DFS. Fabrizio Manfredi Furuholmen. FrOSCon August 2008

Enterprise Monitoring (EM) for the Defense Meteorological Satellite Program (DMSP) Ground System (GS)

Performance Characterization of the Dell Flexible Computing On-Demand Desktop Streaming Solution

Business Continuity and Disaster Recovery. Ed Crowley Ch 12

System Performance: Sizing and Tuning

Standard System Specifications

Dell Exchange 2007 Advisor and Representative Deployments

Dell PowerVault MD3460 Series Storage Arrays Deployment Guide

NTP Software VFM Recovery Portal

High Availability and Disaster Recovery features in Microsoft Exchange Server 2007 SP1

Dell Compellent Storage Center and Windows Server 2012/R2 ODX

Table 9. ASCI Data Storage Requirements

IBM Proventia Management SiteProtector Installation Guide

Safe Place and Code Alert Customer Information Technology Requirements Series 10.x Software

Representing LEAD Experiments in a FEDORA digital repository

14A.2 OPEN AND INTEROPERABLE NWS GEOSPATIAL DATA FROM NOAA IDP NEXTGEN IT WEB SERVICES 1. INTRODUCTION

5PRESENTING AND DISSEMINATING

Accelerating Parallel Analysis of Scientific Simulation Data via Zazen

NetEVS 3.1 User Manual

NATIONAL OCEANIC AND ATMOSPHERIC ADMINISTRATION S SCIENTIFIC DATA STEWARDSHIP PROGRAM

Code Alert Customer Information Technology Requirements Series 30 Software

Installation Guide. CloudShell Version: Release Date: June Document Version: 1.0

Metbox Training Outline LDM Training Metbox overview Break Metbox hands on training

IBM InfoSphere Streams v4.0 Performance Best Practices

IT Business Management System Requirements Guide

Uniform Resource Locator Wide Area Network World Climate Research Programme Coupled Model Intercomparison

rdxlock_readme.txt Overland Tandberg rdxlock readme for Version Build Major Changes:

Emerging Technologies for HPC Storage

NOAA Satellite and Information Service Dan St. Jean, NESDIS Office of Systems Architecture and Advance Planning

RAIDIX Data Storage System. Entry-Level Data Storage System for Video Surveillance (up to 200 cameras)

Running and maintaining a secure Unity RIS/CVIS/PACS

EMC Celerra CNS with CLARiiON Storage

Digital Video Storage Systems

Edge for All Business

Contingency Planning and Disaster Recovery

SPECIFICATION SHEET. Document ID: C. Abbott Point of Care Abbott Park, IL 60064

NTP Software Defendex (formerly known as NTP Software File Auditor) for NetApp

Data Protection at Cloud Scale. A reference architecture for VMware Data Protection using CommVault Simpana IntelliSnap and Dell Compellent Storage

HP Network Automation Support Matrix

Dell Solution for JD Edwards EnterpriseOne with Windows and SQL 2000 for 50 Users Utilizing Dell PowerEdge Servers And Dell Services

Storage Consolidation with the Dell PowerVault MD3000i iscsi Storage

vstart 50 VMware vsphere Solution Specification

CONDUIT and NAWIPS Migration to AWIPS II Status Updates Unidata Users Committee. NCEP Central Operations April 18, 2013

Breaking the Performance Wall: The Case for Distributed Digital Forensics

EMC ISILON X-SERIES. Specifications. EMC Isilon X200. EMC Isilon X400. EMC Isilon X410 ARCHITECTURE

Demartek December 2007

Quick Reference Guide What s New in NSi AutoStore TM 6.0

NetAlly. Application Advisor. Distributed Sites and Applications. Monitor and troubleshoot end user application experience.

The power of centralized computing at your fingertips

IBM TotalStorage Enterprise Storage Server Model 800

Virtual Server Agent for VMware VMware VADP Virtualization Architecture

Transcription:

P2.39 ARCHITECTURE OF MADIS DATA PROCESSING AND DISTRIBUTION AT FSL 1. INTRODUCTION Chris H. MacDermaid*, Robert C. Lipschutz*, Patrick Hildreth*, Richard A. Ryan*, Amenda B. Stanley*, Michael F. Barth, and Patricia A. Miller NOAA Research Forecast Systems Laboratory Boulder, Colorado *In collaboration with the Cooperative Institute for Research in the Atmosphere (CIRA), Colorado State University, Fort Collins, Colorado The NOAA Forecast Systems Laboratory's (FSL) Meteorological Assimilation Data Ingest System (MADIS) (Miller, et al. 2005) supplies quality controlled (QC) observational data to a large and growing segment of the meteorological community, including operational National Weather Service modeling and forecasting users. The requirements for this system have dramatically expanded to accommodate the increasing volume and types of data handled, as well as the growing user base. Today, the system ingests and merges data from dozens of sources, and supplies the resultant quality controlled products to hundreds of users via several standard protocols. Both realtime and saved MADIS datasets are made available. While it needs to be highly reliable, the system must be sized for the CPU-intensive QC analyses, and must provide controlled access to the data so that proprietary datasets are only available to appropriate users. Methods have also been developed for keeping track of the large, diverse user community and consistently responding to specific user needs, for example, notification of significant events. Corresponding author address: Chris MacDermaid, NOAA/OAR/FSL R/FST, 325 Broadway, Boulder, CO 80305-3328; e-mail: Chris.MacDermaid@noaa.gov To ensure reliability of MADIS, the system runs within the FSL Central Facility, a controlled environment maintained by FSL's Information and Technology Services, which provides system administration, networking, configuration management, data acquisition/distribution, and monitoring support (Lipschutz and MacDermaid, 2005). MADIS comprises a distributed architecture for ingest, processing, and data distribution functions. In addition, in a recent architectural advance, the various functional hosts are arranged in pairs using High-Availability (HA) Linux to provide automated failover in case of system failure. Real-time data distribution methods include FTP, Local Data Manager (LDM), and the Web-based OPeNDAP (OPen source project for Network Data Access Protocol). Special consideration for protecting proprietary datasets has added an extra layer of complexity, but is well handled by the configuration methods implemented on the LDM and FTP servers. In this paper, we review the requirements and architecture of the operational implementation of MADIS at FSL. We describe our use of High- Availability Linux toward achieving high system reliability, and consider our implementation of the data distribution methods. Finally, we also discuss our user support infrastructure. 2. SYSTEM REQUIREMENTS MADIS has a number of requirements that the Central Facility needs to meet. Among the requirements are acquiring data from many sources, moving the data between processing nodes, and distributing to MADIS's many clients. MADIS routinely acquires mesonet data from several dozen network providers representing over 14,000 stations. This translates into over 30,000 station reports each hour. Several systems running both inside and outside the firewall acquire the data. These data are sent to the Central Facility's Advanced Weather Interactive Processing System (AWIPS) data server for preprocessing and conversion to a common format, NetCDF. The NetCDF files are then transferred to the MADIS compute nodes. These distributions are accomplished using LDM, which writes the data to local disk for access by the MADIS applications. MADIS also requires access to other surface and upper air data sets. These are acquired through many methods including FTP, LDM, dedicated networks, and satellites. The data are processed using ODS into NetCDF and made avail-

able to MADIS through the Central Facility's NFS attached fileserver. MADIS data files are made available to FSL users on the Central Facility's fileserver that is widely mounted via NFS throughout FSL. For distribution outside FSL, files are distributed to the Central Facility's LDM server for real-time distribution and to an FTP server and OPeNDAP server for real-time and archived distribution. Additionally, several files are routinely distributed to the National Weather Service Telecommunications Gateway (NWSTG) for distribution to the National Weather Service Offices for operational backup. In addition, QC'ed mesonet data are transferred to the National Centers for Environmental Prediction (NCEP). Finally, MADIS requires archiving of all observations. This archive is maintained on the Central Facility's Mass Store System (MSS) and on-line. 3. ARCHITECTURE MADIS comprises a large set of machines for its numerous functions. As shown in Fig. 1, data are transported from input and preprocessing systems to the central compute systems and then out to other hosts for storage and distribution. Fig. 1. MADIS Flow The MADIS compute platforms comprise Intelbased servers running the RedHat Enterprise Linux operating system. Many of these computers are configured using High-Availability Linux clustering. This supports a two-node primary/secondary model with a serial heartbeat and IP address takeover. This is configured so that the secondary node is a hot backup for the primary node and in the event a hardware failure occurs, the secondary node automatically takes over processing. Configurations and system images are managed for these machines using the open source tools CVS and SystemImager (Lipschutz and MacDermaid, 2005). Starting from the left side of Fig. 1, the mesonet data ingest function is shared across two hosts. On one host, scripts are configured to get data via FTP and HTTP from mesonet providers. The other host is running FTP and LDM servers to which mesonet providers are enabled to put data. These data are sent through LDM to the HAconfigured AWIPS Local Data Acquisition and Dissemination (LDAD) server. Preprocessing to get these data into a comma-separated-value (CSV) format is performed. The AWIPS LDAD processing system then converts these into NetCDF files. These data are then sent to the compute nodes where they are written to local disk for QC processing. Additional surface and upper air data are ingested into the Central Facility and processed using ODS. These additional data are made available to MADIS through the Central Facility's NFS attached file server. The QC processing nodes are also configured in a HA cluster; the hardware for this system is the Dell 1750 dual 3.2 GHZ Xeon with 4 GB of RAM,

and a directly attached Dell PowerVault 220 storage device with 8 15,000 RPM 36 GB disks in a RAID 0+1 configuration. The RAID 0+1 configuration was chosen because it enables high I/O rates thanks to its multiple stripe segments. The local I/O required by this processing is currently about 100MB/S. Since some MADIS data are considered proprietary by the provider, access to these data need to be controlled. To accommodate this, MADIS data are split into six different versions based on the level of access allowed by the data provider. A user accesses these data based on their access level. Distribution to FSL users is through the Central Facility's NFS mounted fileserver and MSS. Distribution to other clients is accomplished using FTP, LDM, and OPeNDAP. Multiple LDM servers serve this data out to about 100 users. FTP and OPeNDAP servers run on a shared host to distribute data to over 250 users. 4. USER SUPPORT An important element of the real-time system for MADIS is the user support function. In particular, with the very large number of clients dependent on a reliable data stream, mechanisms for monitoring the system and notifying users about data outage events are critical. Users gain access to this data through a request through the MADIS web page, http://wwwsdd.fsl.noaa.gov/madis/data_application.html. Central Facility operators monitor MADIS through the Facility Information and Control System (FICS). See Fig. 2 for a view of the FICS interface. FICS is a web-based monitor that is used to monitor all FSL's real-time Central Facility processing including MADIS ingest, processing and distribution. Through the use of green, yellow, and red icons, operators can quickly determine MADIS's state. Through links on the web page, operators are able to find troubleshooting information, log the troubleshooting steps taken, and send outage notices to users.

Fig. 2. A Typical FICS Display In addition to FICS, computer resources are monitored by a customized event monitoring system based on Nagios (http://www.nagios.org). Network monitoring is done by the Central Facility's network administration using SolarWinds (http://www.solarwinds.net). Another essential aspect of user support is tracking client contact information. Mailing lists of the MADIS clients' contact information and data types requested are maintained to notify the users of data outages. Currently, contact and account information is maintained in text files in several different locations. This is being moved to an Open Lightweight Directory Access Protocol (OpenLDAP) directory approach to better maintain information. A directory is a database optimized for reading, browsing, and searching. The directory interface is shown in Fig. 3.

Fig. 3. User Directory Interface 5. CONCLUDING REMARKS FSL's Central Facility will provide support for the ever-increasing MADIS data processing, ingest, and distribution. To increase reliability, the transition of the rest of MADIS to a HAconfiguration will continue. Ongoing research into other clustering technologies such as Single System Image Clusters for Linux (OpenSSI) (http://openssi.org) will continue. 6. REFERENCES Miller P. A., M. F. Barth, and L. A. Benjamin, 2005: An update on MADIS observation ingest, integration, quality control, and distribution capabilities. Preprints, 21st International Lipschutz R. C. and, C. H. MacDermaid, 2005: Recent advances in the FSL Central Facility Data Systems. Preprints, 21st International Hamer, P., 2005: FSL Central Facility Data System concepts. Preprints, 21 st International