The Portal Aspect of the LSST Science Platform. Gregory Dubois-Felsmann Caltech/IPAC. LSST2017 August 16, 2017

Similar documents
Science User Interface and Tools: Status. David R. Ciardi & Xiuqin Wu On Behalf of the SUIT Team at IPAC

The IPAC Research Archives. Steve Groom IPAC / Caltech

Visualiza(on in IRSA Services using Firefly

CC-IN2P3 / NCSA Meeting May 27-28th,2015

The NOAO Data Lab Design, Capabilities and Community Development. Michael Fitzpatrick for the Data Lab Team

A Tour of LSST Data Management. Kian- Tat Lim DM Project Engineer and System Architect

Euclid Archive Science Archive System

Spitzer Heritage Archive

THE EUCLID ARCHIVE SYSTEM: A DATA-CENTRIC APPROACH TO BIG DATA

Quality assurance in the ingestion of data into the CDS VizieR catalogue and data services

The Materials Data Facility

Oracle WebCenter Interaction: Roadmap for BEA AquaLogic User Interaction. Ajay Gandhi Sr. Director of Product Management Enterprise 2.

Concept of Operations for the LSST Data Facility Services

Database Developers Forum APEX

A VO-friendly, Community-based Authorization Framework

TAP services integration at IA2 data center

VIRTUAL OBSERVATORY TECHNOLOGIES

Oracle Service Cloud Integration for Developers Ed 1

The Virtual Observatory and the IVOA

Summary of Data Management Principles

RENKU - Reproduce, Reuse, Recycle Research. Rok Roškar and the SDSC Renku team

National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology WISE Archive.

Europeana Core Service Platform

THE EUCLID ARCHIVE SYSTEM: A DATA-CENTRIC APPROACH TO BIG DATA

Sentinet for Microsoft Azure SENTINET

Usage of the Astro Runtime

Il Mainframe e il paradigma dell enterprise mobility. Carlo Ferrarini zsystems Hybrid Cloud

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and

Oracle Service Cloud Integration for Developers Ed 1

Using the Herschel Science Archive

Distributed Archive System for the Cherenkov Telescope Array

Not just an App. Server

Microsoft SharePoint 2010 The business collaboration platform for the Enterprise and the Web. We have a new pie!

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

ArcGIS Enterprise: Portal Administration BILL MAJOR CRAIG CLEVELAND

Exploiting Virtual Observatory and Information Technology: Techniques for Astronomy

The Smithsonian/NASA Astrophysics Data System

Technology for the Virtual Observatory. The Virtual Observatory. Toward a new astronomy. Toward a new astronomy

Oracle Service Cloud Integration for Develope

The Herschel Data Processing System: History, Status and latest Developments

Designing the Future Data Management Environment for [Radio] Astronomy. JJ Kavelaars Canadian Astronomy Data Centre

The Now Platform Reference Guide

Azure DevOps. Randy Pagels Intelligent Cloud Technical Specialist Great Lakes Region

Red Hat Virtualization Increases Efficiency And Cost Effectiveness Of Virtualization

WISE Science Data Center

Fusion Registry 9 SDMX Data and Metadata Management System

Abstract. Introduction. The Virtual Astronomy Multimedia Project

Simile Tools Workshop Summary MacKenzie Smith, MIT Libraries

BEAWebLogic. Portal. Overview

PYRAMID Headline Features. April 2018 Release

Mothra: A Large-Scale Data Processing Platform for Network Security Analysis

Full Stack Web Developer Nanodegree Syllabus

Architectural Design. Architectural Design. Software Architecture. Architectural Models

Case Study: CyberSKA - A Collaborative Platform for Data Intensive Radio Astronomy

WISE Data Processing Overview

Demystifying the Cloud With a Look at Hybrid Hosting and OpenStack

National Health Service

The Astro Runtime. for data access. Noel Winstanley Jodrell Bank, AstroGrid. with the part of Noel played by John Taylor, IfA Edinburgh/AstroGrid

Technical Overview. Access control lists define the users, groups, and roles that can access content as well as the operations that can be performed.

Service Mesh and Microservices Networking

Stream and Batch Processing in the Cloud with Data Microservices. Marius Bogoevici and Mark Fisher, Pivotal

SHAREPOINT 2010 OVERVIEW FOR DEVELOPERS RAI UMAIR SHAREPOINT MENTOR MAVENTOR

Cosmic Peta-Scale Data Analysis at IN2P3

ArcGIS for Server: Administration and Security. Amr Wahba

Unidata and data-proximate analysis and visualization in the cloud

Introducing Oracle Machine Learning

<Insert Picture Here>

SolidFire and Ceph Architectural Comparison

Pangeo. A community-driven effort for Big Data geoscience

Data Management System (DMS) Requirements

The 60-Minute Guide to Development Tools for IBM Lotus Domino, IBM WebSphere Portal, and IBM Workplace Applications

WHITEPAPER. Pipelining Machine Learning Models Together

Time Domain Alerts from LSST & ZTF

Web AppBuilder Presented by

Oracle Enterprise Manager 11g Ops Center 2.5 Hands-on Lab

Data Management Subsystem Requirements

Luckily, our enterprise had most of the back-end (services, middleware, business logic) already.

AD105 Introduction to Application Development for the IBM Workplace Managed Client

Introduction to Grid Computing

Docker and Oracle Everything You Wanted To Know

SCALING LIKE TWITTER WITH APACHE MESOS

No Limits Cloud Introducing the HPE Helion Cloud Suite July 28, Copyright 2016 Vivit Worldwide

SAS IS OPEN (FOR BUSINESS) MATT MALCZEWSKI, SAS CANADA

Meltem Özturan misprivate.boun.edu.tr/ozturan/mis515

The Canadian CyberSKA Project

SAS, OPEN SOURCE & VIYA MATT MALCZEWSKI, SAS CANADA

SnapCenter Software 4.0 Concepts Guide

THE NATIONAL DATA SERVICE(S) & NDS CONSORTIUM A Call to Action for Accelerating Discovery Through Data Services we can Build Ed Seidel

Integrating Anonymous & Authenticated Access to VO Services. Patrick Dowler Canadian Astronomy Data Centre

SAS IS OPEN (FOR BUSINESS) MATT MALCZEWSKI, SAS CANADA

Advanced Solutions of Microsoft SharePoint Server 2013 Course Contact Hours

Advanced Solutions of Microsoft SharePoint 2013

Knowledge-based Grids

JUNE Driving mission success. Modernizing the Navy Marine Corps Intranet

Fabasoft Cloud. boundless digital records management. 16/06/17 1

Cloud Computing For Researchers

SysML model of Exoplanet Archive Functionality and Activities Solange Ramirez a, and the NASA Exoplanet Archive Team a

SAP BW 3.5 Enhanced Reporting Capabilities SAP AG

IBM. PDF file of IBM Knowledge Center topics. IBM Operations Analytics for z Systems. Version 2 Release 2

Microsoft SharePoint Server 2013 Plan, Configure & Manage

Transcription:

The Portal Aspect of the LSST Science Platform Gregory Dubois-Felsmann Caltech/IPAC LSST2017 August 16, 2017 1

Purpose of the LSST Science Platform (LSP) Enable access to the LSST data products Enable visualization and exploration of the LSST data Provide an interface for added-value processing and analysis close to the data Provide access to documentation: Data content, data quality, data processing, software, survey, Observatory systems Enable users to access contextual knowledge of the LSST survey and data, and facilitate their understanding of the relationships among the data products. Context: Support LSST data rights scientists, Science Collaborations, and LSST Project staff Needs to be simple enough to engage general users and flexible enough to meet the needs of experienced and ambitious users. Users will grow and adapt with time. 2

The Science Platform Vision Provide a Portal with access to all LSST data products, visualization and exploration tools, documentation, and structured workflows that guide the users to discover and understand the data and their semantic connections. Provide a flexible, interactive computational environment (Python Notebook) with access to the LSST Data Products, computing, and storage, and with pre-provisioned access to installed versions of the LSST Python software stack. Underpin these user environments with a set of externally available APIs providing access to the Data Products, primarily based on IVOA standards; a computational infrastructure; a flexible LSST and user storage infrastructure; and a database system supporting both the Project data products and users own databases. Enable the users to create workflows that cross between the Portal and Notebook environments and the use of the APIs as their needs dictate. 3

Vision of the Science Platform Portal Enables discovery and exploration of the LSST data with structured workflows. Contextual information about the semantics of the LSST data is used to guide the workflow and the users. Make it easy to share, analyze, and visualize results between the structured Portal environment and the flexible Notebook environment Notebook Enables exploration and analysis of the LSST data with flexible workflows. Environment has access to tools developed by LSST as well as outside tools APIs Unifying Infrastructure 4

Science Platform Underpinnings Portal User Workflows Notebook APIs and Middleware LSST Data Products LSE-163 LSST Databases L1, L2, Cal, EFD LSST Files (e.g., Images) User Databases User Storage User Computing LSST Stack Data Access Center Infrastructure (or equivalent for other instances) (container deployment, batch computing, storage, identity management, ) 5

The Science Platform Vision Provide a Portal with access to all LSST data products, visualization and exploration tools, documentation, and structured workflows that guide the users to discover and understand the data and their semantic connections. Provide a flexible, interactive computational environment (Python Notebook) with access to the LSST Data Products, computing, and storage, and with pre-provisioned access to installed versions of the LSST Python software stack. Underpin these user environments with a set of externally available APIs providing access to the Data Products, primarily based on IVOA standards; a computational infrastructure; a flexible LSST and user storage infrastructure; and a database system supporting both the Project data products and users own databases. Enable the users to create workflows that cross between the Portal and Notebook environments and the use of the APIs as their needs dictate. 6

The Portal Aspect of the Platform Data Discovery lays out the full breadth of the LSST data with documentation Provides both data-product/table-oriented discovery and all-sky exploration Data Connections presents the semantic links between data Data Query supports both basic (UI-driven) and ADQL query building Provides access to both the LSST data and standards-compliant external data Data Visualization provides LSST-aware visualizations: images, tabular data, more Exploratory Data Analysis brushing and linking, filtering, synthetic columns, histogramming, basic astronomical tools such as time-series analysis User Data Access read/write access to workspace and Level 3 Data Products (user DBs) Provides structured environment for the novice Provides a starting point and reference for the expert, with a seamless transition to interactive analysis in the Notebook Aspect Dedicated portals support Observatory processes (QC/V&V, commissioning) The Portal also provides a control interface to the Alert Filtering & Distribution services 7

Portal Implementation The Portal is based on the Firefly framework and libraries for the construction of astronomical archive interfaces Firefly originated with the Spitzer Heritage Archive and provides the underpinnings of the NASA Infrared Science Archive (IRSA) Firefly has a client-server, distributed architecture that enables scalable handling of large result sets and large numbers of users The Firefly server is a Java application The Firefly client is a Javascript application built on the React framework Firefly provides a variety of native visualization tools (images, plots, tables) and a shared data model for query results that supports brushing and linking and other exploratory data analysis operations Connection to Python the Portal will rely on the LSST stack and the Science Platform s computing resources to enable LSST-specific extensions to the core Firefly capabilities Example: the visualization of the LSST footprint / de-blending model 8

Discovery and Connections Provides structured environment to guide user in exploring LSST data Highlight the most important science data products but also provide access to all other released data, including Reformatted Engineering and Facilities Database (EFD) Meta data content, related document (written by SUIT or other systems) Data connections (examples): Find all the Processed Visit Images (PVIs) used to make a coadded image Display the images that an object was extracted from Display the light curve data for a selected object 9

Data Query Provide customized query panels to support contextual workflows for a finite set of data products (~10) Provide a generic query builder for all LSST data products with metadata information, description and data type to help users construct desired queries, as well as ability to issue an ADQL statement 10

Data Visualization and Exploration Brushing and linking data displays: Images, tables, plots, and histograms Use LSST pipeline modules to enable LSST-specific data visualization Footprint, deblending model 11

More visualizations Flexible layout scheme allows assembling many linked visualizations 12

Many Faces of Firefly 13

Portal Workspace Connections Provide access to user workspace and user databases User workspace: private file-oriented read-write storage (VOSpace) User databases: import user data to join; retain and export query results Access controls for user data (enabling collaboration) Provide UI for user to understand their storage and compute resources Provide UI for user to manage query history 14

Portal Documentation Guide users to documentation of various types SUIT team will write online help, a user guide, and API documentation Portal will provide links to other documentation: data products, science pipeline code, survey strategy, data quality reports, etc. 15

The Science Platform Vision Provide a Portal with access to all LSST data products, visualization and exploration tools, and structured workflows that guide the users to discover and understand the data and their semantic connections. Provide a flexible, interactive computational environment (Python Notebook) with access to the LSST Data Products, computing, and storage, and with pre-provisioned access to installed versions of the LSST Python software stack. Underpin these user environments with a set of externally available APIs providing access to the Data Products, primarily based on IVOA standards; a computational infrastructure; a flexible LSST and user storage infrastructure; and a database system supporting both the Project data products and users own databases. Enable the users to create workflows that cross between the Portal and Notebook environments and the use of the APIs as their needs dictate. 16

Cross-Aspect Connections Portal Query/Visualize: xyz User Workflows Notebook Analyze: xyz Data/Query ID: xyz Data/Query ID: xyz APIs Unifying Infrastructure Connections enable sharing of data between components and therefore more complex workflows and analysis 17

Cross-Aspect Connections Users are not confined to one aspect or the other The design provides simple data connections among all the aspects of the Science Platform Enabled by the common DAX architecture and computing infrastructure Enabled by the use of Javascript tools for visualization, and the APIs of the Portal s Firefly infrastructure Find or create data in one aspect; view or analyze that data in another aspect Queries are shareable across the Portal and the Notebook, e.g.: Build a query in the Portal UI; verify the results by browsing it in the UI; access the results from the Notebook for further analysis Code complex ADQL query in the Notebook; browse results in the Portal Capture a query formulated in the Portal as code reusable later as a Notebook-driven query Connections can be made either through identity (DAX knows the queries you have recently performed) or through simple UI actions (e.g., copy/paste of a query token between windows) Moving from data discovery to analysis We expect to provide a one click means of launching a fresh notebook with pre-configured access to the results of data identified in the Portal Large query support DAX interfaces support paging Portal will integrate this with user-level paging Still thinking about what the ultimate constraints will be LSST will provide a separate Bulk Download Service for very large-scale exports 18

Deployments of the Science Platform Customers of the Science Platform LSST Data Rights Scientists, Science Collaborations, and LSST Project staff Deployments US and Chilean Data Access Centers Supporting LSST s science users Commissioning Cluster Supporting the process of commissioning the Summit systems (Telescope, Camera, OCS), and providing resources for the Observatory engineering team in the longer run Science Validation Environment Supporting QC, verification, and validation work on software releases and on Data Releases Integration Environment (a.k.a. Prototype Data Access Center PDAC) Analysis & Developer Support Domain Supporting integration and test of the Science Platform itself, pre-release verification of the LSP 19

Deployment Schedule Integration Environment (i.e., PDAC): now Initial version of PDAC was released in December 2016 Started with demonstration of integration of basic Portal Aspect tools with DAX APIs Regular updates with additional data and functionality are planned (next slide) Science Validation Environment: end of 2017, maturing in 2018 Planned to support the Science Pipelines development and DM verification & validation activities Initially focused on the HSC public-release data Key initial features: add Notebook Aspect, add basic user database support Commissioning Cluster: mid 2019 Supports clean-room testing of ComCam and the main Camera, and spectrograph operations All Aspects; key initial features: generic table access and support for EFD data access US Data Access Center: 2021 Provides outside access to commissioning data; use for Commissioning of DM starts earlier! Chilean Data Access Center: 2022 20

PDAC Evolution PDACv1 December 2016 SDSS Stripe 82 data, basic Portal functionality Tests basic integration of Portal tools with Qserv & DAX services PDACv2 Now (July 2017) Add WISE data (AllWise catalog tables, image access) NEOWISE single-epoch photometry later as ingest files available from IRSA Scaling tests for Qserv; additional Portal features Basic login capabilities PDACv3 Late 2018 Add HSC data Visualizations ready for LSSTCam support; Portal UX improvements Notebook Aspect (JupyterLab) added Authentication and authorization used throughout User workspace User testing of each version PDACv1: DMTR-22 21

PDAC Evolution (2) PDACv4 Fall 2019 Portal configuration to support ComCam data All-sky visualizations; further UX improvements Full user table support PDACv5 2020 Next-to-database processing Prototype of DAC Portal UX Continued use of PDAC to deploy new versions all the way through start of operations PDAC will be used for (internal) system integration and scaling tests, and User testing will be performed at each major release; we plan to reach out to a slowly growing set of science users to 22

Access for Data Rights Holders Full access to the Data Access Centers Science Platform environments will come with the start of operations We plan to release commissioning data to data rights holders before that, but Expect a months-long period of analysis by the team before the data are released This will be on a best-effort basis by the pre-operations team The current commitment is only to provide a bulk download facility 23

Portal Demo Demonstration of the Portal Aspect in the current version of PDAC 24

Science Platform Identity Provider Browser Web Portal Firefly Servers DB (QServ, Other) Browser Firefly Widgets JupyterLab Servers (managed by JupyterHub) Jupyter Client DAX User https: External Users of LSST APIs TAP, SIA https: VOSpace (WebDAV) IPython Kernel Firefly Python microservices Image Storage User Computing Parallel/Batch Computing User User Storage 25

IPAC Identity Provider Browser Web Portal Firefly Servers (load-balanced) DB (QServ, Other) Browser Firefly Widgets JupyterLab Servers (managed by JupyterHub) Jupyter Client DAX User https: External Users of LSST APIs TAP, SIA https: VOSpace (WebDAV) IPython Kernel Firefly Python microservices Image Storage User Computing Parallel/Batch Computing User User Storage 26