Similar documents

escience in the Cloud Dan Fay Director Earth, Energy and Environment

escience in the Cloud: A MODIS Satellite Data Reprojection and Reduction Pipeline in the Windows

XLDB 11 Cloud Computing at Scale. Roger Barga Microsoft Research

Software + Services for Data Storage, Management, Discovery, and Re-Use

Building a Data Strategy for a Digital World

Visualization for Scientists. We discuss how Deluge and Complexity call for new ideas in data exploration. Learn more, find tools at layerscape.

The Materials Data Facility

The Social Grid. Leveraging the Power of the Web and Focusing on Development Simplicity

ArcGIS Enterprise: An Introduction. David Thom Solution Engineer State Government

The Computation and Data Needs of Canadian Astronomy

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019

Modern Data Warehouse The New Approach to Azure BI

Open Data Standards for Administrative Data Processing

How Insurers are Realising the Promise of Big Data

MATLAB. Senior Application Engineer The MathWorks Korea The MathWorks, Inc. 2

Science-as-a-Service

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

Assessing Applications of Cloud Computing to NASA s Earth Observing System Data and Information System (EOSDIS)

The CEDA Archive: Data, Services and Infrastructure

Metadata Models for Experimental Science Data Management

THE NATIONAL DATA SERVICE(S) & NDS CONSORTIUM A Call to Action for Accelerating Discovery Through Data Services we can Build Ed Seidel

Integrate MATLAB Analytics into Enterprise Applications

2011 INTERNATIONAL COMPARISON PROGRAM

USC Viterbi School of Engineering

2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice

Introduction to Grid Computing

Towards Open Innovation with Open Data Service Platform

Yogesh Simmhan. escience Group Microsoft Research

Digital Preservation and The Digital Repository Infrastructure

The EHRI GraphQL API IEEE Big Data Workshop on Computational Archival Science

Approaching the Petabyte Analytic Database: What I learned

Shine a Light on Dark Data with Vertica Flex Tables

Interoperability and transparency The European context

MarkLogic 8 Overview of Key Features COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

3D in the ArcGIS Platform. Chris Andrews

The Logical Data Store

Long-term preservation for INSPIRE: a metadata framework and geo-portal implementation

CSIRO and the Open Data Cube

Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21)

LASDA: an archiving system for managing and sharing large scientific data

BEST BIG DATA CERTIFICATIONS

Integrate MATLAB Analytics into Enterprise Applications

Creating a Recommender System. An Elasticsearch & Apache Spark approach

Scaling Science to the Cloud. Tony Hey Corporate Vice President Microsoft Corporation

SQL Server Evolution. SQL 2016 new innovations. Trond Brande

BIG DATA TESTING: A UNIFIED VIEW

ArcGIS Hub: Open data best practices. Graham Hudgins, esri product engineer

Medici for Digital Cultural Heritage Libraries. George Tsouloupas, PhD The LinkSCEEM Project

ArcGIS Enterprise: Architecting Your Deployment

Big Data infrastructure and tools in libraries

NDSA Web Archiving Survey

Example Azure Implementation for Government Agencies. Indirect tax-filing system. By Alok Jain Azure Customer Advisory Team (AzureCAT)

ArcGIS Enterprise: An Introduction. Philip Heede

Research Data Preserve, Share, Reuse, Publish, or Perish Mark Gahegan Director, Centre for eresearch 24 Symonds St

Beyond Relational Databases: MongoDB, Redis & ClickHouse. Marcos Albe - Principal Support Percona

A Study of Mountain Environment Monitoring Based Sensor Web in Wireless Sensor Networks

Big Data - Some Words BIG DATA 8/31/2017. Introduction

EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography

Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitative Social Science (IQSS) Harvard

Enterprise Data Catalog for Microsoft Azure Tutorial

Extending the SDSS Batch Query System to the National Virtual Observatory Grid

An Introduction to Big Data Formats

MOHA: Many-Task Computing Framework on Hadoop

Hierarchy of knowledge BIG DATA 9/7/2017. Architecture

DATA MINING II - 1DL460

Scaling MATLAB. for Your Organisation and Beyond. Rory Adams The MathWorks, Inc. 1

Metadata for Data Discovery: The NERC Data Catalogue Service. Steve Donegan

Metadata Ingestion and Processinng

For Attribution: Developing Data Attribution and Citation Practices and Standards

Introduction to Data Science

A data-driven framework for archiving and exploring social media data

SOA Architect. Certification

Alabama Supercomputer Center Alabama Research and Education Network

Dr. John Snow s London Cholera Map (1854) Very early example of data science

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

Science 2.0 VU Big Science, e-science and E- Infrastructures + Bibliometric Network Analysis

Chapter 6 VIDEO CASES

Florida Coastal Everglades LTER Program

Fusion Registry 9 SDMX Data and Metadata Management System

Clare Richards, Benjamin Evans, Kate Snow, Chris Allen, Jingbo Wang, Kelsey A Druken, Sean Pringle, Jon Smillie and Matt Nethery. nci.org.

Convergence and Collaboration: Transforming Business Process and Workflows

ENVI ANALYTICS ANSWERS YOU CAN TRUST

ArcGIS Enterprise: Architecture & Deployment. Anthony Myers

I data set della ricerca ed il progetto EUDAT

Information Systems and Tech (IST)

SEAD Data Services. Jim Best Practices in Data Infrastructure Workshop. Cooperative agreement #OCI

Fishing Activity Visualization with Free Software Bigdata Analytics Institute

The DOI Identifier. Drexel University. From the SelectedWorks of James Gross. James Gross, Drexel University. June 4, 2012

Knowledge-based Grids

The Role of Repositories and Journals in the Astronomy Research Lifecycle

Data Intensive Computing SUBTITLE WITH TWO LINES OF TEXT IF NECESSARY PASIG June, 2009

BIG DATA COURSE CONTENT

Webinar Annotate data in the EUDAT CDI

Big Data: Information, Data, Events, Analytics at Scale

Data Analytics at Logitech Snowflake + Tableau = #Winning

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Open data and analytics for a sustainable energy future. Version 0.2 January 23 rd, 2017

2011 INTERNATIONAL COMPARISON PROGRAM

Industrial system integration experts with combined 100+ years of experience in software development, integration and large project execution

Transcription:

dan.fay@microsoft.com

Scientific Data Intensive Computing Workshop 2004

Visualizing and Experiencing E 3 Data + Information: Provide a unique experience to reduce time to insight and knowledge through visualizing data and information Accessible Data: Ensure E 3 data (remote and local sensing) is easily discoverable, accessible and consumable in the scientists domain Enabling Scientific Collaboration: Look at new ways to enable collaboration in scientific virtual organizations

Experimental Science Theoretical Science Newton s Laws, Maxwell s Equations Computational Science Simulation of complex phenomena Data-Intensive Science. a a 2 4 G 3 c a 2 2 captured by instruments generated by simulations generated by sensor networks

http://fourthparadigm.org

The Problem for the e-scientist Experiments & Instruments facts Simulations facts Questions Literature facts Answers Other Archives facts Data ingest Managing a petabyte Common schema How to organize it How to reorganize it How to share with others (With thanks to Jim Gray) The Generic Problems Query and Vis tools Building and executing models Integrating data and Literature Documenting experiments Curation and long-term preservation

Forecasting Integration Analysis Reporting Done poorly, but a few notable counter-examples Distribution Aggregation Quality assurance Collation Monitoring Done poorly to moderately, not easy to find Sometimes done well, generally discoverable and available, but could be improved (I. Zaslavsky & CSIRO, BOM, WMO)

Environmental Ecosystem Knowledge Action Inform

Environmental Ecosystem Knowledge Action Analysis Data Insight Inform Decide Communicate Implement Publish Technology

13

Manual Measurement Automated Measurement Sample Collection Typing Historical Photographs Satellite Aircraft Surveys Model Output Counting Relatively Ubiquitous Motes

Why Make this Distinction? Provenance and trust widely varies Data acquisition, early processing, and reporting ranges from a large government agency to individual scientists. Smaller data often passed around in email; big data downloads can take days (if at all) Data sharing concerns and patterns vary Open access followed by (non-repeatable and tedious) pre-processing True science ready data set but concerns about misuse, misunderstanding particularly for hard won data. Computational tools differ. Not everyone can get an account at a supercomputer center Very large computations require engineering (error handling) Space and time aren t always simple dimensions KB PB TB GB Complex shared detector Complex and Heavy process by experts Simple instrument (if any) Science happens when PBs, TBs, GBs, and KBs can be be mashed up up simply simply Ad hoc observations and models

Source Metadata Request Queue AzureMODIS Service Web Role Portal Scientist Source Imagery Download Sites... Data Collection Stage Reprojection Stage Reprojection Queue Analysis/Reduction Stage Reduction Queue Scientific Results Catharine van Ingen (Microsoft Research), Jie Li, Marty Humphreys (UVA), Youngryel Ryu (UCB), Deb Agarwal (BWC/LBL)

Microsoft Codename "Cloud Numerics numerical and data analytics library for data scientists, quantitative analysts, and others Microsoft Codename "Data Explorer organize, manage, mash up and gain new insights from your data. Microsoft Codename "Data Hub Organizationsto curate and publish its data on a private data marketplace Microsoft Codename "Trust Services data encryption services for cloud applications so that they can roam encryption keys in a secure way.

Common Problems with Data To use data from different sources o o o o Non-standard formats, scales, and units Lack of data quality control Lack of metadata Difficult to repurpose data for different (my) tools Data Sources To share data o Lack of incentive (no credit) SQL data data XML o Need extra resources and tools Hidden problems, seldom addressed o Versioning Data Cube CSV (data) o Provenance o Curation

Current State of Data Ecosystem Applications Android iphone Windows Phone WebOS Data Sources Excel Java Silverlight.NET AJAX PHP MATLAB SQL data data XML (data) Cloud Service HPC Cluster DB Sever Data Server Data Cube CSV Web server

Advance data discoverability, accessibility, and consumability PivotViewer http://www.odata.org provides a way to unlock your data and free it from data silos does this by building upon Web technologies such as HTTP, Atom Publishing Protocol (AtomPub) and JSON to provide access to information from a variety of applications, services, and stores. In Open Source/Specifications Promise being submitted to OASIS Marketplace Spatial An application of a set ofsql internet standards: It allows you to form URLs based on what you know about the underlying data A Web protocol for querying and updating data maps HTTP, Atom (RFC 4287), AtomPub (RFC 5023), REST semantics Existing standards + easy data access API Added Geospatial data support Feedback from the Community encouraged www.odata.org

5 color images of ¼ of the sky Pictures of 300 million celestial objects Distances to the closest 1 million galaxies Have to publish the data before astronomers publish their analysis

Public Use of the SkyServer 380 million web hits in 6 years 930,000 distinct users vs 10,000 astronomers 1600 scientific papers Delivered 50,000 hours of lectures to high schools Delivered 100B rows of data Goal of 1 million visual galaxy classifications by the public Allows general public to search for photographs and classify different types of galaxies

Seamless Rich Social Media Virtual Sky and Earth Web application for science and education Goals Integration of data sets and one-click contextual access Easy access and use Tours for sharing information/insights Spatial/temporal visualization of 3D datasets Updates API for extensibility Excel Add-in for easy data integration Not just for Astronomy Earth Focused Being used in Planetariums Cal Academy We invite you to experience it! www.worldwidetelescope.org

www.layerscape.org

Kinect SDK and WWT

Platform as a Service Infrastructure as a Service Software as a Service

http://research.microsoft.com/events/escience2012