The Portal Aspect of the LSST Science Platform Gregory Dubois-Felsmann Caltech/IPAC LSST2017 August 16, 2017 1
Purpose of the LSST Science Platform (LSP) Enable access to the LSST data products Enable visualization and exploration of the LSST data Provide an interface for added-value processing and analysis close to the data Provide access to documentation: Data content, data quality, data processing, software, survey, Observatory systems Enable users to access contextual knowledge of the LSST survey and data, and facilitate their understanding of the relationships among the data products. Context: Support LSST data rights scientists, Science Collaborations, and LSST Project staff Needs to be simple enough to engage general users and flexible enough to meet the needs of experienced and ambitious users. Users will grow and adapt with time. 2
The Science Platform Vision Provide a Portal with access to all LSST data products, visualization and exploration tools, documentation, and structured workflows that guide the users to discover and understand the data and their semantic connections. Provide a flexible, interactive computational environment (Python Notebook) with access to the LSST Data Products, computing, and storage, and with pre-provisioned access to installed versions of the LSST Python software stack. Underpin these user environments with a set of externally available APIs providing access to the Data Products, primarily based on IVOA standards; a computational infrastructure; a flexible LSST and user storage infrastructure; and a database system supporting both the Project data products and users own databases. Enable the users to create workflows that cross between the Portal and Notebook environments and the use of the APIs as their needs dictate. 3
Vision of the Science Platform Portal Enables discovery and exploration of the LSST data with structured workflows. Contextual information about the semantics of the LSST data is used to guide the workflow and the users. Make it easy to share, analyze, and visualize results between the structured Portal environment and the flexible Notebook environment Notebook Enables exploration and analysis of the LSST data with flexible workflows. Environment has access to tools developed by LSST as well as outside tools APIs Unifying Infrastructure 4
Science Platform Underpinnings Portal User Workflows Notebook APIs and Middleware LSST Data Products LSE-163 LSST Databases L1, L2, Cal, EFD LSST Files (e.g., Images) User Databases User Storage User Computing LSST Stack Data Access Center Infrastructure (or equivalent for other instances) (container deployment, batch computing, storage, identity management, ) 5
The Science Platform Vision Provide a Portal with access to all LSST data products, visualization and exploration tools, documentation, and structured workflows that guide the users to discover and understand the data and their semantic connections. Provide a flexible, interactive computational environment (Python Notebook) with access to the LSST Data Products, computing, and storage, and with pre-provisioned access to installed versions of the LSST Python software stack. Underpin these user environments with a set of externally available APIs providing access to the Data Products, primarily based on IVOA standards; a computational infrastructure; a flexible LSST and user storage infrastructure; and a database system supporting both the Project data products and users own databases. Enable the users to create workflows that cross between the Portal and Notebook environments and the use of the APIs as their needs dictate. 6
The Portal Aspect of the Platform Data Discovery lays out the full breadth of the LSST data with documentation Provides both data-product/table-oriented discovery and all-sky exploration Data Connections presents the semantic links between data Data Query supports both basic (UI-driven) and ADQL query building Provides access to both the LSST data and standards-compliant external data Data Visualization provides LSST-aware visualizations: images, tabular data, more Exploratory Data Analysis brushing and linking, filtering, synthetic columns, histogramming, basic astronomical tools such as time-series analysis User Data Access read/write access to workspace and Level 3 Data Products (user DBs) Provides structured environment for the novice Provides a starting point and reference for the expert, with a seamless transition to interactive analysis in the Notebook Aspect Dedicated portals support Observatory processes (QC/V&V, commissioning) The Portal also provides a control interface to the Alert Filtering & Distribution services 7
Portal Implementation The Portal is based on the Firefly framework and libraries for the construction of astronomical archive interfaces Firefly originated with the Spitzer Heritage Archive and provides the underpinnings of the NASA Infrared Science Archive (IRSA) Firefly has a client-server, distributed architecture that enables scalable handling of large result sets and large numbers of users The Firefly server is a Java application The Firefly client is a Javascript application built on the React framework Firefly provides a variety of native visualization tools (images, plots, tables) and a shared data model for query results that supports brushing and linking and other exploratory data analysis operations Connection to Python the Portal will rely on the LSST stack and the Science Platform s computing resources to enable LSST-specific extensions to the core Firefly capabilities Example: the visualization of the LSST footprint / de-blending model 8
Discovery and Connections Provides structured environment to guide user in exploring LSST data Highlight the most important science data products but also provide access to all other released data, including Reformatted Engineering and Facilities Database (EFD) Meta data content, related document (written by SUIT or other systems) Data connections (examples): Find all the Processed Visit Images (PVIs) used to make a coadded image Display the images that an object was extracted from Display the light curve data for a selected object 9
Data Query Provide customized query panels to support contextual workflows for a finite set of data products (~10) Provide a generic query builder for all LSST data products with metadata information, description and data type to help users construct desired queries, as well as ability to issue an ADQL statement 10
Data Visualization and Exploration Brushing and linking data displays: Images, tables, plots, and histograms Use LSST pipeline modules to enable LSST-specific data visualization Footprint, deblending model 11
More visualizations Flexible layout scheme allows assembling many linked visualizations 12
Many Faces of Firefly 13
Portal Workspace Connections Provide access to user workspace and user databases User workspace: private file-oriented read-write storage (VOSpace) User databases: import user data to join; retain and export query results Access controls for user data (enabling collaboration) Provide UI for user to understand their storage and compute resources Provide UI for user to manage query history 14
Portal Documentation Guide users to documentation of various types SUIT team will write online help, a user guide, and API documentation Portal will provide links to other documentation: data products, science pipeline code, survey strategy, data quality reports, etc. 15
The Science Platform Vision Provide a Portal with access to all LSST data products, visualization and exploration tools, and structured workflows that guide the users to discover and understand the data and their semantic connections. Provide a flexible, interactive computational environment (Python Notebook) with access to the LSST Data Products, computing, and storage, and with pre-provisioned access to installed versions of the LSST Python software stack. Underpin these user environments with a set of externally available APIs providing access to the Data Products, primarily based on IVOA standards; a computational infrastructure; a flexible LSST and user storage infrastructure; and a database system supporting both the Project data products and users own databases. Enable the users to create workflows that cross between the Portal and Notebook environments and the use of the APIs as their needs dictate. 16
Cross-Aspect Connections Portal Query/Visualize: xyz User Workflows Notebook Analyze: xyz Data/Query ID: xyz Data/Query ID: xyz APIs Unifying Infrastructure Connections enable sharing of data between components and therefore more complex workflows and analysis 17
Cross-Aspect Connections Users are not confined to one aspect or the other The design provides simple data connections among all the aspects of the Science Platform Enabled by the common DAX architecture and computing infrastructure Enabled by the use of Javascript tools for visualization, and the APIs of the Portal s Firefly infrastructure Find or create data in one aspect; view or analyze that data in another aspect Queries are shareable across the Portal and the Notebook, e.g.: Build a query in the Portal UI; verify the results by browsing it in the UI; access the results from the Notebook for further analysis Code complex ADQL query in the Notebook; browse results in the Portal Capture a query formulated in the Portal as code reusable later as a Notebook-driven query Connections can be made either through identity (DAX knows the queries you have recently performed) or through simple UI actions (e.g., copy/paste of a query token between windows) Moving from data discovery to analysis We expect to provide a one click means of launching a fresh notebook with pre-configured access to the results of data identified in the Portal Large query support DAX interfaces support paging Portal will integrate this with user-level paging Still thinking about what the ultimate constraints will be LSST will provide a separate Bulk Download Service for very large-scale exports 18
Deployments of the Science Platform Customers of the Science Platform LSST Data Rights Scientists, Science Collaborations, and LSST Project staff Deployments US and Chilean Data Access Centers Supporting LSST s science users Commissioning Cluster Supporting the process of commissioning the Summit systems (Telescope, Camera, OCS), and providing resources for the Observatory engineering team in the longer run Science Validation Environment Supporting QC, verification, and validation work on software releases and on Data Releases Integration Environment (a.k.a. Prototype Data Access Center PDAC) Analysis & Developer Support Domain Supporting integration and test of the Science Platform itself, pre-release verification of the LSP 19
Deployment Schedule Integration Environment (i.e., PDAC): now Initial version of PDAC was released in December 2016 Started with demonstration of integration of basic Portal Aspect tools with DAX APIs Regular updates with additional data and functionality are planned (next slide) Science Validation Environment: end of 2017, maturing in 2018 Planned to support the Science Pipelines development and DM verification & validation activities Initially focused on the HSC public-release data Key initial features: add Notebook Aspect, add basic user database support Commissioning Cluster: mid 2019 Supports clean-room testing of ComCam and the main Camera, and spectrograph operations All Aspects; key initial features: generic table access and support for EFD data access US Data Access Center: 2021 Provides outside access to commissioning data; use for Commissioning of DM starts earlier! Chilean Data Access Center: 2022 20
PDAC Evolution PDACv1 December 2016 SDSS Stripe 82 data, basic Portal functionality Tests basic integration of Portal tools with Qserv & DAX services PDACv2 Now (July 2017) Add WISE data (AllWise catalog tables, image access) NEOWISE single-epoch photometry later as ingest files available from IRSA Scaling tests for Qserv; additional Portal features Basic login capabilities PDACv3 Late 2018 Add HSC data Visualizations ready for LSSTCam support; Portal UX improvements Notebook Aspect (JupyterLab) added Authentication and authorization used throughout User workspace User testing of each version PDACv1: DMTR-22 21
PDAC Evolution (2) PDACv4 Fall 2019 Portal configuration to support ComCam data All-sky visualizations; further UX improvements Full user table support PDACv5 2020 Next-to-database processing Prototype of DAC Portal UX Continued use of PDAC to deploy new versions all the way through start of operations PDAC will be used for (internal) system integration and scaling tests, and User testing will be performed at each major release; we plan to reach out to a slowly growing set of science users to 22
Access for Data Rights Holders Full access to the Data Access Centers Science Platform environments will come with the start of operations We plan to release commissioning data to data rights holders before that, but Expect a months-long period of analysis by the team before the data are released This will be on a best-effort basis by the pre-operations team The current commitment is only to provide a bulk download facility 23
Portal Demo Demonstration of the Portal Aspect in the current version of PDAC 24
Science Platform Identity Provider Browser Web Portal Firefly Servers DB (QServ, Other) Browser Firefly Widgets JupyterLab Servers (managed by JupyterHub) Jupyter Client DAX User https: External Users of LSST APIs TAP, SIA https: VOSpace (WebDAV) IPython Kernel Firefly Python microservices Image Storage User Computing Parallel/Batch Computing User User Storage 25
IPAC Identity Provider Browser Web Portal Firefly Servers (load-balanced) DB (QServ, Other) Browser Firefly Widgets JupyterLab Servers (managed by JupyterHub) Jupyter Client DAX User https: External Users of LSST APIs TAP, SIA https: VOSpace (WebDAV) IPython Kernel Firefly Python microservices Image Storage User Computing Parallel/Batch Computing User User Storage 26