Dennis Gannon Data Center Futures Microsoft Research

Similar documents
CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

Software + Services for Data Storage, Management, Discovery, and Re-Use

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography

What is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)?

Introduction to Grid Computing

The Social Grid. Leveraging the Power of the Web and Focusing on Development Simplicity

Basics of Cloud Computing Lecture 2. Cloud Providers. Satish Srirama

AWS 101. Patrick Pierson, IonChannel

En oversikt En, oversikt likheter, og forskjeller Rune Zakariassen Microsoft Micr

Large Scale Sky Computing Applications with Nimbus

Azure DevOps. Randy Pagels Intelligent Cloud Technical Specialist Great Lakes Region

Dr. Fabrizio Gagliardi

Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21)

How to Keep UP Through Digital Transformation with Next-Generation App Development

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

Science 2.0 VU Big Science, e-science and E- Infrastructures + Bibliometric Network Analysis

what is cloud computing?

SEAD Data Services. Jim Best Practices in Data Infrastructure Workshop. Cooperative agreement #OCI

The Materials Data Facility

Data publication and discovery with Globus

Clouds: An Opportunity for Scientific Applications?

Large Scale Computing Infrastructures

Building scalable service-based applications Wicked Fast

Understanding the latent value in all content

Modelos de Negócio na Era das Clouds. André Rodrigues, Cloud Systems Engineer

HPC learning using Cloud infrastructure

e-infrastructures in FP7 INFO DAY - Paris

Windows Azure Services - At Different Levels

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

escience in the Cloud Dan Fay Director Earth, Energy and Environment

Pasiruoškite ateičiai: modernus duomenų centras. Laurynas Dovydaitis Microsoft Azure MVP

Chapter 5. The MapReduce Programming Model and Implementation

CloudSwyft Learning-as-a-Service Course Catalog 2018 (Individual LaaS Course Catalog List)

To the Designer Where We Need Your Help

Citrix Workspace Cloud

How to Secure Your Cloud with...a Cloud?

Challenges for Data Driven Systems

Cloud Computing 4/17/2016. Outline. Cloud Computing. Centralized versus Distributed Computing Some people argue that Cloud Computing. Cloud Computing.

Merging Enterprise Applications with Docker* Container Technology

Cloud Computing. What is cloud computing. CS 537 Fall 2017

irods at TACC: Secure Infrastructure for Open Science Chris Jordan

Container in Production : Openshift 구축사례로 이해하는 PaaS. Jongjin Lim Specialist Solution Architect, AppDev

AT&T Flow Designer. Current Environment

Data Intensive Scalable Computing

DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing WHAT IS CLOUD COMPUTING? 2. Slide 3. Slide 1. Why is it called Cloud?

Sky Computing on FutureGrid and Grid 5000 with Nimbus. Pierre Riteau Université de Rennes 1, IRISA INRIA Rennes Bretagne Atlantique Rennes, France

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

DATA SCIENCE USING SPARK: AN INTRODUCTION

Cloud Computing. Ennan Zhai. Computer Science at Yale University

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

Datacenter Management and The Private Cloud. Troy Sharpe Core Infrastructure Specialist Microsoft Corp, Education

Cloud Computing and Service-Oriented Architectures

RISC-V: Enabling a New Era of Open Data-Centric Computing Architectures

Cloud Computing & Visualization

Grid Computing Systems: A Survey and Taxonomy

Advances in GIS help create Smarter Communities

Title DC Automation: It s a MARVEL!

Basics of Cloud Computing Lecture 2. Cloud Providers. Satish Srirama

Prashant Kumar Program Manager Microsoft Session Code:

OpenNebula on VMware: Cloud Reference Architecture

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

Agenda. This Session: Azure Networking Basics, On-prem connectivity options DEMO Create VNET/Gateway Cost-estimation for VNET/Gateways

UVA HPC & BIG DATA COURSE. Cloud Computing. Adam Belloum

by Cisco Intercloud Fabric and the Cisco

Welcome to the New Era of Cloud Computing

Knowledge-based Grids

Exam : Implementing Microsoft Azure Infrastructure Solutions

DataONE: Open Persistent Access to Earth Observational Data

Demystifying the Cloud With a Look at Hybrid Hosting and OpenStack

The Role of Repositories and Journals in the Astronomy Research Lifecycle

Scientific Workflow Tools. Daniel Crawl and Ilkay Altintas San Diego Supercomputer Center UC San Diego

Virtual Tech Update Intercloud Fabric. Michael Petersen Systems Engineer, Cisco Denmark

Advanced School in High Performance and GRID Computing November Introduction to Grid computing.

ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development

Supporting Customer Growth Strategies by Anticipating Market Change End-to-end Optimization of Value Chains

Building a Data-Friendly Platform for a Data- Driven Future

SciSpark 201. Searching for MCCs

Data Centers and Cloud Computing

Integrate MATLAB Analytics into Enterprise Applications

Deploying Applications on DC/OS

Fusion Registry 9 SDMX Data and Metadata Management System

Supporting Data Stewardship Throughout the Data Life Cycle in the Solid Earth Sciences

REFERENCE ARCHITECTURE Quantum StorNext and Cloudian HyperStore

Introducing Lotus Domino 8, Designer 8 and Composite Applications

Distributed Systems CS6421

Digital Curation and Preservation: Defining the Research Agenda for the Next Decade

Fujitsu World Tour 2018

Distributed Systems. 31. The Cloud: Infrastructure as a Service Paul Krzyzanowski. Rutgers University. Fall 2013

Introduction to data centers

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

Modeling & Simulation as a Service (M&SaaS)

Web and API Apps in Azure

Mobile Wireless Sensor Network enables convergence of ubiquitous sensor services

BIG DATA TESTING: A UNIFIED VIEW

2013 AWS Worldwide Public Sector Summit Washington, D.C.

Accelerate your Software Delivery Lifecycle with IBM Development and Test Environment Services

CloudCenter for Developers

How to use Water Data to Produce Knowledge: Data Sharing with the CUAHSI Water Data Center

Transcription:

Dennis Gannon Data Center Futures Microsoft Research

The question What is role of the private sector data center in the national science agenda? The Quest to Broaden Access Science as a Web Service Science Gateways. The Cloud Data Centers, EC2, Hadoop, SaaS Dryad and Azure Academic/Industry Outcomes Things that are possible

TeraGrid Science Gateways

The US National Supercomputer Grid CyberInfrastructure composed of a set of resources (compute and data) that provide common services for Wide area data management (gridftp, gpfs, staged disk to tape.) Single sign on user authentication (globus toolkit) Distributed Job scheduling and management. (in the works.) Collectively 1Petaflop 20 Petabytes Soon to triple. Will add a new petaflop machine each year.

A large-scale operational cyberinfrastructure that provides very high-end computational capabilities for leading-edge research in the national open science community (TeraGrid Deep) while also reaching out to new users and communities to broaden the impact of cyberinfrastructure in research, education, and society (TeraGrid Wide)

How do we provide scientific services to communities (VOs) that have a need of high end computation and data analysis, but no desire to use supercomputers? They want Access to advanced domain specific analysis and simulation tools Wide area data storage (indexing and curation) Collaboration tools Start with a supercomputer user who wants to provide app services to others through a Science portals. Examples. True Supercomputer user The rest of The domain scientists

On demand prediction of tornados and hurricanes To support three classes of users Meteorology research scientists & grad students. Undergrads in meteorology classes People who want easy access to weather data. Go to: http://www.leadproject.org

A Framework for Discovery Four basic components Data Discovery Catalogs and index services The experiment Computational workflow managing on demand resources Data analysis and visualization Data product preservation, automatic metadata generation and experimental data providence.

What did we learn? Simple portal for data access is easy Providing index and upload/catalog is harder Providing on demand scalable services for dozens or hundreds of users is very hard. Why? TeraGrid is best effort. Fault tolerance is very hard. SC users learn to tolerate faults. Basic services are extremely unreliable under load.

Putting reliability and scalability first

Current DCs 100K+ servers 100 MWatts power Evolved from off the shelf PCs to specially configured racks Modular containerized parking garages Specialized functions and layered architecture. Designed for RELIABILITY and SCALABILITY Client interfaces Routing and mediating services Computation management Persistent Data Resource Management

The approaches define a space of solutions OS Virtualization Parallel Frameworks Software as a Service OS Virtualization Data center cloud Application space Parallel Frameworks Software as a Service

Simple Idea (promoted by Amazon EC2) Provide a platform that can allow app designers to upload a VM image and store it and then instantiate copies on demand. Give app designers a menu of VM choices Flavors of Linux and Windows with standard web servers and database components. Give them basic web services to manage instances and back end data. Requires sys admin level management 3 rd party companies provide high level app config tools (RightScale, GigaSpaces, Elastra, 3Tera, )

Deploy a datacenter wide application framework that makes it easy to build highly parallel data analysis application. Use simple parallel templates with inversion of control concept: App designer provides kernel of data analysis application The framework controls parallel execution and access to parallel file system and data structures. Data Collection map map map map map map map map reduce reduce Data Collection reduce Map: apply application kernel Function to data chunks in parallel Reduce : apply application data Reduction filter to map output.

Google has made MapReduce famous. Based on Google File System Parallel, distributed, redundant read often, write infrequently file system. BigTable a parallel data structure built on GFS Two dimensional sparse map. Cells are time stamped, to allow for history BigTable can be used as parallel input or output structure for map reduce computations. Open Source version: Hadoop created by Yahoo! Part of NSF big data program.

MapReduce is only one instance of many possible parallel execution templates. Simple parallel workflow/macrodataflow/systolic constructs can be used to create arbitrarily nested, massively parallel execution patterns It is possible to build control & execution frameworks to run these on large data centers. The parallelism effectively exploits manycore. Microsoft Dryad and Dryad Linq..

The role of the cloud is to provide a place where application suppliers can make apps available to clients. The applications are then hosted services. The cloud automatically scales to meet client demand The cloud is reliable and robust. The data center provides the tools and core services that make it easy to build the apps. Cohesive has a Ruby on Rails engine for cloud app deployment. Google AppEngine is a Python runtime with APIs to access things like BigTable SUN s Project Caroline is based on spawning remote Java VMs

Designed to allow anybody to build scalable services Access to large distributed storage Large flat structured store or databases You write your own service components Azure framework load it to a VM and manages your scale out requirements Beta now, full production by 2010.

A Few Examples from Tony Hey and Roger Barga at MSR

Research Information Center In collaboration with The British Library Virtual Research Environment for Science, Technology and Medicine built using MOSS 07 and Office 2007 Technical Goals SharePoint as place to do research, support collaborations Tools, services to tackle inefficiencies, pain points in research Simplifying searching for information, discovery, alerts, managing research objects such as papers, references, bookmarks, proposals, etc. Build initially to support Science Technology & Medicine but general enough support any research area Value to Researchers Can be used for any research collaboration to support collaboration, manage research objects.

Trident Scientific Workflow Workbench Univ. of Washington and Monterey Bay Aquarium Research Institute Scientific workflow workbench to automate the data processing pipelines of the world s first plate scale undersea observatory Technical Goals From raw data to useable data products (vizualizations) Focusing on cleaning, analysis, regridding, interpolation Support real time, on demand visualizations Custom activities and workflow libraries for authoring Visual programming accessible via a browser Value for Researchers A scientific workflow workbench for a number of science projects, reusable workflows, automatic provenance capture.

Famulus Research Output repository A platform for building services and tools for research output repositories Papers, Videos, Presentations, Lectures, References, Data, Code, etc. Relationships between stored entities Technical Goals Support the publishing and dissemination platform for all researcher outputs Enable a tools and services ecosystem for research output repositories on MS technologies

Seamless Rich Social Media Virtual Sky Web application for science and education Project organization Alyssa Goodman (Harvard) Alex Szalay (JHU) Curtis Wong, Jonathan Fay (MSR) Project Goals Science Seamless integration of data sets and one click contextual access Education Easy as Powerpoint Development Launch: http://www.worldwidetelescope.org/ WWT application WWT Download website 27

Scientific advances are increasingly made by harvesting knowledge from streams of data. Sensor networks are critical to geoscience, physics, engineering, economics Given access to the right data streams and on demand access to computation you can Mange the energy consumption of a large city. Monitor an active earthquake zone and provide warnings that can save lives Predict tornados Do the motion planning for swarms of remote robots exploring the ocean floor Monitor the heath of the planet s food supply. Find the Higgs boson

How can we advance research by creating private sector + university + government collaboration? Beyond point solutions, what level of infrastructure outsourcing can help? Where can the tools of large data/web analysis used in the private sector be leveraged by the research community?

2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.