Cumulus Services Working Group. Dan Pilone SE TIM / August 2017

Similar documents
Assessing Applications of Cloud Computing to NASA s Earth Observing System Data and Information System (EOSDIS)

Earthdata Cloud Analytics Project

Enterprise Recording and Live Streaming Architecture with VBrick

Modern Data Warehouse The New Approach to Azure BI

AWS Storage Gateway. Not your father s hybrid storage. University of Arizona IT Summit October 23, Jay Vagalatos, AWS Solutions Architect

Western U.S. TEMPO Early Adopters

Object storage platform How it can help? Martin Lenk, Specialist Senior Systems Engineer Unstructured Data Solution, Dell EMC

About Intellipaat. About the Course. Why Take This Course?

VMware Cloud on AWS Technical Deck VMware, Inc.

Store, Protect, Optimize Your Healthcare Data in AWS

Cloud Storage with AWS: EFS vs EBS vs S3 AHMAD KARAWASH

White Paper / Azure Data Platform: Ingest

Data Movement & Tiering with DMF 7

Introduction to Cloud Computing

Apache Hadoop 3. Balazs Gaspar Sales Engineer CEE & CIS Cloudera, Inc. All rights reserved.

Clare Richards, Benjamin Evans, Kate Snow, Chris Allen, Jingbo Wang, Kelsey A Druken, Sean Pringle, Jon Smillie and Matt Nethery. nci.org.

AWS Administration. Suggested Pre-requisites Basic IT Knowledge

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019

EMC ISILON HARDWARE PLATFORM

What s New at AWS? looking at just a few new things for Enterprise. Philipp Behre, Enterprise Solutions Architect, Amazon Web Services

BIG DATA CHALLENGES A NOAA PERSPECTIVE

Expected Learning Outcomes Introduction To AWS

Amazon Web Services (AWS) Solutions Architect Intermediate Level Course Content

Agenda. Introduction Storage Primer Block Storage Shared File Systems Object Store On-Premises Storage Integration

How Smart Networks are changing the Corporate WAN

MONITORING SERVERLESS ARCHITECTURES

CS / Cloud Computing. Recitation 3 September 9 th & 11 th, 2014

Video on Demand on AWS

What s New at AWS? A selection of some new stuff. Constantin Gonzalez, Principal Solutions Architect, Amazon Web Services

Immersion Day. Getting Started with AWS Lambda. August Rev

REFERENCE ARCHITECTURE Quantum StorNext and Cloudian HyperStore

CLOUD COMPUTING It's about the data. Dr. Jim Baty Distinguished Engineer Chief Architect, VP / CTO Global Sales & Services, Sun Microsystems

The Materials Data Facility

Vodafone keynote. How smart networks are changing the corporate WAN. Peter Terry Brown Director of Connectivity & UC.

Actifio Test Data Management

Is Your Project in Trouble on System Performance?

Making Data Integration Easy For Multiplatform Data Architectures With Diyotta 4.0. WEBINAR MAY 15 th, PM EST 10AM PST

Splunk & AWS. Gain real-time insights from your data at scale. Ray Zhu Product Manager, AWS Elias Haddad Product Manager, Splunk

AWS Agility + Splunk Visibility = Cloud Success. Splunk App for AWS Demo. Laura Ripans, AWS Alliance Manager

SAAS: THE RDP ADVANTAGE FOR ISVS AND USERS

MEDIA PROCESSING ON CLOUD

Microservices on AWS. Matthias Jung, Solutions Architect AWS

Cloud Confidence: Simple Seamless Secure. Dell EMC Data Protection for VMware Cloud on AWS

Manage AWS Services. Cost, Security, Best Practice and Troubleshooting. Principal Software Engineer. September 2017 Washington, DC

EMC Forum 2014 EMC ViPR and ECS: A Lap Around Software-Defined Services. Magnus Nilsson Blog: purevirtual.

AWS Storage Gateway. Amazon S3. Amazon EFS. Amazon Glacier. Amazon EBS. Amazon EC2 Instance. storage. File Block Object. Hybrid integrated.

Pass4test Certification IT garanti, The Easy Way!

Migrating Enterprise Applications to the Cloud Session 672. Leighton L. Nelson

Real-time Streaming Applications on AWS Patterns and Use Cases

HDF Product Designer: A tool for building HDF5 containers with granule metadata

SAP API Management and API Business Hub Overview

How to Cloud for Earth Scientists: An Introduction

Storage for HPC, HPDA and Machine Learning (ML)

Why Microsoft s head is in the clouds and what it means to you.

Scaling Massive Content Stores in the Cloud. CloudExpo New York June Alfresco Founder & CTO

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

Microservices without the Servers: AWS Lambda in Action

Enroll Now to Take online Course Contact: Demo video By Chandra sir

Cloud Analytics and Business Intelligence on AWS

EMC Forum EMC ViPR and ECS: A Lap Around Software-Defined Services

Azure Webinar. Resilient Solutions March Sander van den Hoven Principal Technical Evangelist Microsoft

Improving Oceanographic Anomaly Detection Using High Performance Computing


Deep Dive on AWS CodeStar

AWS 101. Patrick Pierson, IonChannel

Cisco Tetration Analytics

VOLTDB + HP VERTICA. page

Training on Amazon AWS Cloud Computing. Course Content

AWS Services for Data Migration Luke Anderson Head of Storage, AWS APAC

Unidata and data-proximate analysis and visualization in the cloud

ActiveNET. #202, Manjeera Plaza, Opp: Aditya Park Inn, Ameerpetet HYD

Storage on the Lunatic Fringe. Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium

Jim Mains Director of Business Strategy and Media Services Media Solutions Group, EMC Corporation

WHITE PAPER. Applying Software-Defined Security to the Branch Office

Cloud Overview. Mr. John Hale Chief, DISA Cloud Portfolio February, 2018 UNITED IN SERVICE TO OUR NATION UNCLASSIFIED 1

Amazon Web Services Training. Training Topics:

AWS Storage Optimization. AWS Whitepaper

Splunk & Amazon Web Services

Amara's law : Overestimating the effects of a technology in the short run and underestimating the effects in the long run

Monitoring Fires from Space and Getting Data in to the hands of Users An Example from NASA s Fire Information for Resource Management System (FIRMS)

Big Data on AWS. Peter-Mark Verwoerd Solutions Architect

AWS Course Syllabus. Linux Fundamentals. Installation and Initialization:

Q THE RISE OF MOBILE AND TABLET VIDEO GLOBAL VIDEO INDEX LONG-FORM VIDEO CONTINUES TO ENGAGE LIVE VIDEO DOMINATES ON-DEMAND MEDIA

Using Cohesity with Amazon Web Services (AWS)

The New Normal. Unique Challenges When Monitoring Hybrid Cloud Environments

Managing IoT and Time Series Data with Amazon ElastiCache for Redis

OSIsoft IIoT Overview Chicago Regional Seminar 2016

The Orion Papers. AWS Solutions Architect (Associate) Exam Course Manual. Enter

Microservices Architekturen aufbauen, aber wie?

SciSpark 201. Searching for MCCs

Get the Most Out of GoAnywhere: Achieving Cloud File Transfers and Integrations

Microsoft Developer Day

A Journey to DynamoDB

Cloud is the 'Only' Way Forward in Information Security. Leveraging Scale to Make the Unknown Known, in Dev, Sec & Ops.

EARTHCUBE CONCEPTUAL DESIGN

We are ready to serve Latest IT Trends, Are you ready to learn? New Batches Info

Data Protection Everywhere

Evolution of Rack Scale Architecture Storage

Dell EMC Data Protection Everywhere

Startups and Mobile Apps on AWS. Dave Schappell, Startup Business Development Manager, AWS September 11, 2013

Transcription:

Cumulus Services Working Group Dan Pilone dan@element84.com SE TIM / August 2017

2

Reminder: Why are we doing this? 3

Background: Motivation for Cloud Growth of Mission Data & Processing: Projected rapid archive growth and the need to effectively process significantly larger volumes of new mission data requires rethinking existing architectures. Est 132 TB daily ingest by 2022, forward stream keep up rate! Data Systems: More cost effective, flexible, and scalable data system ingest, archive, and distribution solutions are needed to keep pace with new mission advancement and capabilities Science Users: Significantly larger data volumes requires additional ways to access and utilize this data, with Data Close to Compute * 47 PB annually * Bulk repressing not included. e.g. NISAR Mission alone has bulk processing spikes of 400+TB per day Estimated Daily Data Volume Over 5 Years 4 7

ExCEL Efforts and Project Prototypes ExCEL Project 1 7 2 6 3 5 4 NGAP NASA Compliant General Application Platform (NGAP), an operational, dev-ops, and sandbox AWS cloud based operating environment. ASF WOS Prototype AWS/NGAP Web Object Storage (WSO) prototyping large volumes of mission data dynamically between AWS S3, S3-IA, and Glacier object storage. Managed out of Alaska Satellite Facility Earthdata Search Client to Cloud NASA Earth Science data search by keyword and advanced filters such as time and space Cumulus Prototype addressing core EOSDIS capabilities including data ingest, archive, management, and distribution of large volumes of EOS data. Getting Ready for NISAR (GRFN) Integrated prototype of science product generation and delivery from a DAAC system focused on coupling ASF DAAC and JPL ARIA systems. CATEES Easy-to-use Python tools packaged to support EOSDIS cross-daac science workflows and analytics over large volumes of EOS data in AWS. ECC to Cloud Study Earth Code Collaborative (ECC) study to determine cloud ready capabilities to migrate into AWS/ NGAP platform. 5 8

ExCEL Efforts and Project Prototypes Continued GIBS in the Cloud Migrating GIBS to the AWS/NGAP Cloud based on recommendations made in the GIBS in the Cloud Study 8 Earthdata Login to Cloud Study Study to determine and recommend migrating the Earthdata Login into AWS/ NGAP cloud environment CMR to Cloud Migration of the Common Metadata Repository, into the AWS/NGAP platform based on recommendations made in the CMR to Cloud study. 10 9 ExCEL Project OPeNDAP/HDF Cloud Studies Study to determine and recommend a cloud native integration of OPeNDAP accessing HDF5 and netcdf4 data on AWS/NGAP platform. NEXUS Prototype to accelerate end-user analysis of remote sensing data, highly parallel to better enable science discovery Network Prototypes Network prototypes to support to test security, monitoring, logging, and to perform R&D testing to support all ExCEL project prototypes. 11 12 13 6 9

7

Session Agenda Part I - Introduction and Overview What is the Services Working Group? Cumulus, the EOSDIS Cloud Archive, and you Part II - Services & Reference Patterns Data Download Computing Near the Data Compute with Local Caching Part III - Cloud Archive Discussions Egress and cost management Service Data Distribution* Challenges to and measurements of success What s next? 8

Part I Cumulus Services Working Group

Success is when we have... Defined the boundaries of Cumulus proper vs. Services on EOSDIS Cloud Archive(s) Defined what services can assume with respect to data accessibility Documented core use cases and reference architectures Captured what the group's needs and experience suggest for data representation decisions, e.g. data lake, alternate representations (e.g. parquet), and access patterns 10

FY17 Group Outputs High Level Architecture Diagrams Reference Patterns for Services Not an actual reference pattern Recommendations for data access, use, delivery, and staging 11

Participating Members ASF AWS Cumulus EED2 Services ESDIS GIBS in the Cloud Giovanni LPDAAC NSIDC OPeNDAP PO.DAAC UAH 12

13

14

15

Context View: Data Provider

17

18

On-premises data center On-prem processing and data generation On-prem storage (FTP, HTTP, etc.) 19

S3 Product Bucket Product Workflows Cumulus On-premises data center PDRs, HTTP, FTP, etc On-prem processing and data generation On-prem storage (FTP, HTTP, etc.) 20

S3 Product Bucket Product Workflows SIPS Bucket Cumulus On-premises data center On-premises data center On-prem processing and data generation On-prem storage (FTP, HTTP, etc.) On-prem processing and data generation 21

On-premises data center On-prem storage EDPA? Product Workflows SIPS & Product Bucket Cumulus On-premises data center Cloud based SIPS On-premises data center On-prem processing and data generation On-prem storage (FTP, HTTP, etc.) Cloud based processing & data generation On-prem processing and data generation 22

Context View: DAAC

24

25

Cumulus intends to be the starting point for common DAAC functionality. It is designed to be extended, both in terms of workflows, hooks, and pluggable components and containers. DAAC 1 Extensions Cumulus DAAC 2 Extensions Cumulus DAAC 3 Extensions Cumulus 26

27

Context View: Using EOSDIS Archive Data

29

30

EOS Archive Use Cases Basic Data Access Compute Near Data Service-based Data Distribution 31

Part II Services and Reference Patterns

Basic Data Access 33

34

Compute Near the Data SAR OnDemand at ASF 35

DAAC Create Earthdata Login SIPS Submit SDS API List, Get Order API Order UI Operat or User User Monitor Order DB SDS Ingest 36

Cloud Compute with Local Cache AppEARS from LPDAAC 37

38

Part III Cloud Archive Discussions

NGAP, EGRESS, AND THE ADA

Egress is a *big* deal When data leaves your application, service, S3 bucket, etc. and goes to another region outside of AWS Egress is expensive Rack Rates: $0.08/GB after first 150TB Becomes a significant portion of total monthly cloud-associated costs 41

Different kinds of Egress EOSDIS Data Distribution from S3 Application Interactions (e.g. CMR results, EDSC, Earthdata Pages, etc.) EOSDIS Data from Services EOSDIS Data to AWS compute 42

Cost isn t the biggest issue A huge bill is bad but jail is worse. The Anti-Deficiency Act (ADA) disallows unbounded costs We need a means of absolutely limiting egress costs 43

Egress is mostly S3 data NGAP S3 Storage All other NGAP egress * Approximations based on current usage 44

Enter the circuit breaker 45

Operational NGAP Architecture GPMCE shared account NGAP account ELBs Apps ELBs Routers Apps Routers Apps NGAP 1.0/1.1 NGAP 1.5 46 11

Circuit breakers protect the house S3 is the biggest issue Limit S3 egress and you address the majority of the problem S3 egress is actually the easiest to calculate This is not sophisticated by design If egress hits a threshold, break the circuit All this does is keep Mark out of jail Applications will want to do more 47

We are working on an Egress shaping solution 48

Conceptual Design Lambda 1: Calculate S3 egress Watch each bucket s BytesDownloaded via CloudWatch Post totals Lambda 2: Break the circuit (if needed) If total from first of billing period to $NOW exceeds threshold lock down the S3 bucket policy 49

We are working on an Egress shaping solution 50

#keepmarkfree 51

Services and Reference Patterns - redux

Different kinds of Egress EOSDIS Data Distribution from S3 Application Interactions (e.g. CMR results, EDSC, Earthdata Pages, etc.) EOSDIS Data from Services EOSDIS Data to AWS compute 53

Different kinds of Egress EOSDIS Data Distribution from S3 Application Interactions (e.g. CMR results, EDSC, Earthdata Pages, etc.) EOSDIS Data from Services EOSDIS Data to AWS compute 54

S3 is a Distribution Mechanism - Mar[ck] @ AWS 55

56

Discussion Topics

Is this helpful? 58

Best way to capture this? Not an actual reference pattern 59

What do you need to incentivize users to move compute to the data? 60

More Questions Who is the audience? Service metrics capture? Barriers to moving user compute to data? Additional service patterns, reference architectures, sequence diagrams, developer guides, etc. you d like to see? Metrics that would demonstrate successful exploitation of cloud archive? 61

Even more Virtual Directories / Virtual S3 Indices Egress Traffic Shaping Solution Cloud native service implementations Hot data management a la Chris S. talk Multiple representations of data Hybrid archive and distribution discussion Demonstration of new, large scale compute use case Integration of preservation and DR efforts 62

Thank you! Working group meets every two weeks Material presented here will continue to evolve and be rolled into documentation and patterns available on Earthdata Wiki Please let us know what would be helpful to encourage adoption and transition! 63