Cumulus Services Working Group Dan Pilone dan@element84.com SE TIM / August 2017
2
Reminder: Why are we doing this? 3
Background: Motivation for Cloud Growth of Mission Data & Processing: Projected rapid archive growth and the need to effectively process significantly larger volumes of new mission data requires rethinking existing architectures. Est 132 TB daily ingest by 2022, forward stream keep up rate! Data Systems: More cost effective, flexible, and scalable data system ingest, archive, and distribution solutions are needed to keep pace with new mission advancement and capabilities Science Users: Significantly larger data volumes requires additional ways to access and utilize this data, with Data Close to Compute * 47 PB annually * Bulk repressing not included. e.g. NISAR Mission alone has bulk processing spikes of 400+TB per day Estimated Daily Data Volume Over 5 Years 4 7
ExCEL Efforts and Project Prototypes ExCEL Project 1 7 2 6 3 5 4 NGAP NASA Compliant General Application Platform (NGAP), an operational, dev-ops, and sandbox AWS cloud based operating environment. ASF WOS Prototype AWS/NGAP Web Object Storage (WSO) prototyping large volumes of mission data dynamically between AWS S3, S3-IA, and Glacier object storage. Managed out of Alaska Satellite Facility Earthdata Search Client to Cloud NASA Earth Science data search by keyword and advanced filters such as time and space Cumulus Prototype addressing core EOSDIS capabilities including data ingest, archive, management, and distribution of large volumes of EOS data. Getting Ready for NISAR (GRFN) Integrated prototype of science product generation and delivery from a DAAC system focused on coupling ASF DAAC and JPL ARIA systems. CATEES Easy-to-use Python tools packaged to support EOSDIS cross-daac science workflows and analytics over large volumes of EOS data in AWS. ECC to Cloud Study Earth Code Collaborative (ECC) study to determine cloud ready capabilities to migrate into AWS/ NGAP platform. 5 8
ExCEL Efforts and Project Prototypes Continued GIBS in the Cloud Migrating GIBS to the AWS/NGAP Cloud based on recommendations made in the GIBS in the Cloud Study 8 Earthdata Login to Cloud Study Study to determine and recommend migrating the Earthdata Login into AWS/ NGAP cloud environment CMR to Cloud Migration of the Common Metadata Repository, into the AWS/NGAP platform based on recommendations made in the CMR to Cloud study. 10 9 ExCEL Project OPeNDAP/HDF Cloud Studies Study to determine and recommend a cloud native integration of OPeNDAP accessing HDF5 and netcdf4 data on AWS/NGAP platform. NEXUS Prototype to accelerate end-user analysis of remote sensing data, highly parallel to better enable science discovery Network Prototypes Network prototypes to support to test security, monitoring, logging, and to perform R&D testing to support all ExCEL project prototypes. 11 12 13 6 9
7
Session Agenda Part I - Introduction and Overview What is the Services Working Group? Cumulus, the EOSDIS Cloud Archive, and you Part II - Services & Reference Patterns Data Download Computing Near the Data Compute with Local Caching Part III - Cloud Archive Discussions Egress and cost management Service Data Distribution* Challenges to and measurements of success What s next? 8
Part I Cumulus Services Working Group
Success is when we have... Defined the boundaries of Cumulus proper vs. Services on EOSDIS Cloud Archive(s) Defined what services can assume with respect to data accessibility Documented core use cases and reference architectures Captured what the group's needs and experience suggest for data representation decisions, e.g. data lake, alternate representations (e.g. parquet), and access patterns 10
FY17 Group Outputs High Level Architecture Diagrams Reference Patterns for Services Not an actual reference pattern Recommendations for data access, use, delivery, and staging 11
Participating Members ASF AWS Cumulus EED2 Services ESDIS GIBS in the Cloud Giovanni LPDAAC NSIDC OPeNDAP PO.DAAC UAH 12
13
14
15
Context View: Data Provider
17
18
On-premises data center On-prem processing and data generation On-prem storage (FTP, HTTP, etc.) 19
S3 Product Bucket Product Workflows Cumulus On-premises data center PDRs, HTTP, FTP, etc On-prem processing and data generation On-prem storage (FTP, HTTP, etc.) 20
S3 Product Bucket Product Workflows SIPS Bucket Cumulus On-premises data center On-premises data center On-prem processing and data generation On-prem storage (FTP, HTTP, etc.) On-prem processing and data generation 21
On-premises data center On-prem storage EDPA? Product Workflows SIPS & Product Bucket Cumulus On-premises data center Cloud based SIPS On-premises data center On-prem processing and data generation On-prem storage (FTP, HTTP, etc.) Cloud based processing & data generation On-prem processing and data generation 22
Context View: DAAC
24
25
Cumulus intends to be the starting point for common DAAC functionality. It is designed to be extended, both in terms of workflows, hooks, and pluggable components and containers. DAAC 1 Extensions Cumulus DAAC 2 Extensions Cumulus DAAC 3 Extensions Cumulus 26
27
Context View: Using EOSDIS Archive Data
29
30
EOS Archive Use Cases Basic Data Access Compute Near Data Service-based Data Distribution 31
Part II Services and Reference Patterns
Basic Data Access 33
34
Compute Near the Data SAR OnDemand at ASF 35
DAAC Create Earthdata Login SIPS Submit SDS API List, Get Order API Order UI Operat or User User Monitor Order DB SDS Ingest 36
Cloud Compute with Local Cache AppEARS from LPDAAC 37
38
Part III Cloud Archive Discussions
NGAP, EGRESS, AND THE ADA
Egress is a *big* deal When data leaves your application, service, S3 bucket, etc. and goes to another region outside of AWS Egress is expensive Rack Rates: $0.08/GB after first 150TB Becomes a significant portion of total monthly cloud-associated costs 41
Different kinds of Egress EOSDIS Data Distribution from S3 Application Interactions (e.g. CMR results, EDSC, Earthdata Pages, etc.) EOSDIS Data from Services EOSDIS Data to AWS compute 42
Cost isn t the biggest issue A huge bill is bad but jail is worse. The Anti-Deficiency Act (ADA) disallows unbounded costs We need a means of absolutely limiting egress costs 43
Egress is mostly S3 data NGAP S3 Storage All other NGAP egress * Approximations based on current usage 44
Enter the circuit breaker 45
Operational NGAP Architecture GPMCE shared account NGAP account ELBs Apps ELBs Routers Apps Routers Apps NGAP 1.0/1.1 NGAP 1.5 46 11
Circuit breakers protect the house S3 is the biggest issue Limit S3 egress and you address the majority of the problem S3 egress is actually the easiest to calculate This is not sophisticated by design If egress hits a threshold, break the circuit All this does is keep Mark out of jail Applications will want to do more 47
We are working on an Egress shaping solution 48
Conceptual Design Lambda 1: Calculate S3 egress Watch each bucket s BytesDownloaded via CloudWatch Post totals Lambda 2: Break the circuit (if needed) If total from first of billing period to $NOW exceeds threshold lock down the S3 bucket policy 49
We are working on an Egress shaping solution 50
#keepmarkfree 51
Services and Reference Patterns - redux
Different kinds of Egress EOSDIS Data Distribution from S3 Application Interactions (e.g. CMR results, EDSC, Earthdata Pages, etc.) EOSDIS Data from Services EOSDIS Data to AWS compute 53
Different kinds of Egress EOSDIS Data Distribution from S3 Application Interactions (e.g. CMR results, EDSC, Earthdata Pages, etc.) EOSDIS Data from Services EOSDIS Data to AWS compute 54
S3 is a Distribution Mechanism - Mar[ck] @ AWS 55
56
Discussion Topics
Is this helpful? 58
Best way to capture this? Not an actual reference pattern 59
What do you need to incentivize users to move compute to the data? 60
More Questions Who is the audience? Service metrics capture? Barriers to moving user compute to data? Additional service patterns, reference architectures, sequence diagrams, developer guides, etc. you d like to see? Metrics that would demonstrate successful exploitation of cloud archive? 61
Even more Virtual Directories / Virtual S3 Indices Egress Traffic Shaping Solution Cloud native service implementations Hot data management a la Chris S. talk Multiple representations of data Hybrid archive and distribution discussion Demonstration of new, large scale compute use case Integration of preservation and DR efforts 62
Thank you! Working group meets every two weeks Material presented here will continue to evolve and be rolled into documentation and patterns available on Earthdata Wiki Please let us know what would be helpful to encourage adoption and transition! 63