Dr. Fabrizio Gagliardi

Similar documents
Introduction to Windows Azure Cloud Computing Futures Group, Microsoft Research Roger Barga, Jared Jackson, Nelson Araujo, Dennis Gannon, Wei Lu, and

Grid computing: yesterday, today and tomorrow?

CSE 124: THE DATACENTER AS A COMPUTER. George Porter November 20 and 22, 2017

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

Cloud Computing. What is cloud computing. CS 537 Fall 2017

CS5950 / CS6030 Cloud Computing

2013 AWS Worldwide Public Sector Summit Washington, D.C.

Data Center Fundamentals: The Datacenter as a Computer

Infrastructure Innovation Opportunities Y Combinator 2013

Data Centers and Cloud Computing. Data Centers

Microsoft s Cloud. Delivering operational excellence in the cloud Infrastructure. Erik Jan van Vuuren Azure Lead Microsoft Netherlands

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Most real programs operate somewhere between task and data parallelism. Our solution also lies in this set.

CS 102. Big Data. Spring Big Data Platforms

Magellan Project. Jeff Broughton NERSC Systems Department Head October 7, 2009

Integrate MATLAB Analytics into Enterprise Applications

CSE6331: Cloud Computing

Supercomputing and Mass Market Desktops

Dennis Gannon Data Center Futures Microsoft Research

MapReduce for Scalable and Cloud Computing

MapReduce for Scalable and Cloud Computing

Smarter Systems In Your Cloud Deployment

Distributed Pervasive Systems

Matti Latva-aho Academy Professor Director for Finnish Wireless Flagship 6Genesis University of Oulu, Centre for Wireless Communications (CWC)

Technology Changes in Data Centers

OpenStack Changing the shape of Open Source Cloud Computing. Tom Fifield Community Manager, OpenStack Foundation

Cloud Computing Economies of Scale

AMD Opteron Processors In the Cloud

Hierarchy of knowledge BIG DATA 9/7/2017. Architecture

Cloud & Datacenter EGA

Intro to Software as a Service (SaaS) and Cloud Computing

Data Centers and Cloud Computing. Slides courtesy of Tim Wood

Cloud Computing Introduction & Offerings from IBM

Chris Hickman President, Energy Services

A Computer Scientist Looks at the Energy Problem

Embedded Technosolutions

MCSE Cloud Platform & Infrastructure CLOUD PLATFORM & INFRASTRUCTURE.

I am a Data Nerd and so are YOU!

Faculté Polytechnique

STREAMLINED CERTIFICATION PATHS

Massive Scalability With InterSystems IRIS Data Platform

Service Provider Consulting

Atos announces the Bull sequana X1000 the first exascale-class supercomputer. Jakub Venc

Women in Telecommunications November 11, Emmy B. Gengler Softjourn, Inc.

MCSE Mobility Earned: MCSE Cloud Platform & Infrastructure Earned: 2017 MCSE MCSE. MCSD App Builder. MCSE Business Applications Earned 2017

CS 6240: Parallel Data Processing in MapReduce: Module 1. Mirek Riedewald

GRIDS INTRODUCTION TO GRID INFRASTRUCTURES. Fabrizio Gagliardi

The Cambridge Bio-Medical-Cloud An OpenStack platform for medical analytics and biomedical research

OSIsoft Technologies for the Industrial IoT and Industry 4.0

Big Data - Some Words BIG DATA 8/31/2017. Introduction

Pervasive DataRush TM

Genomics on Cisco Metacloud + SwiftStack

AWS & Intel: A Partnership Dedicated to fueling your Innovations. Thomas Kellerer BDM CSP, Intel Central Europe

HPE SimpliVity 380. Simplyfying Hybrid IT with HPE Wolfgang Privas Storage Category Manager

CloudSwyft Learning-as-a-Service Course Catalog 2018 (Individual LaaS Course Catalog List)

Digitalization in the energy landscape - integrating renewable energy FAPESP, Sao Paulo, 13. November 2017

CSE 291: Data Center Networking. Spring 2015 Tu/Th 8:00-9:20am George Porter UC San Diego

GET CLOUD EMPOWERED. SEE HOW THE CLOUD CAN TRANSFORM YOUR BUSINESS.

Accelerate your Azure Hybrid Cloud Business with HPE. Ken Won, HPE Director, Cloud Product Marketing

A Universal Micro-Server Ecosystem Exceeding the Energy and Performance Scaling Boundaries

Future of the Data Center

40,000 TRANSFORM INFRASTRUCTURE AT THE EDGE. Introduction. Exploring the edge. The digital universe is doubling every two years

IBM Data Center Networking in Support of Dynamic Infrastructure

NGMN 5G Vision for Vertical Industries. Philipp Deibert 3 rd November 2016 Managing Rail Mobile Communications Evolution

Demystifying the Cloud With a Look at Hybrid Hosting and OpenStack

DDN About Us Solving Large Enterprise and Web Scale Challenges

DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing WHAT IS CLOUD COMPUTING? 2. Slide 3. Slide 1. Why is it called Cloud?

STREAMLINED CERTIFICATION PATHS

Oracle Exadata: Strategy and Roadmap

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

CIS : Scalable Data Analysis

CHARTING THE FUTURE OF SOFTWARE DEFINED NETWORKING

Chapter. IT Infrastructure: Hardware and Software

in Action Fujitsu High Performance Computing Ecosystem Human Centric Innovation Innovation Flexibility Simplicity

En oversikt En, oversikt likheter, og forskjeller Rune Zakariassen Microsoft Micr

RISC-V: Enabling a New Era of Open Data-Centric Computing Architectures

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme


MATE-EC2: A Middleware for Processing Data with Amazon Web Services

APCO s Vision for NG Jay English, Chief Technology Officer Jeff Cohen, Chief Counsel & Director of Government Relations

Renovating your storage infrastructure for Cloud era

DDN Annual High Performance Computing Trends Survey Reveals Rising Deployment of Flash Tiers & Private/Hybrid Clouds vs.

EMC ISILON HARDWARE PLATFORM

On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers

Infrastructure with intelligence

Cloud Computing Briefing Presentation. DANU

Data Centers and Cloud Computing

Windows 10 IoT Overview. Microsoft Corporation

Transform Your Business To An Open Hybrid Cloud Architecture. Presenter Name Title Date

Bandwidth Boom Technology and Public Policy in the exaflood Era

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure

Storage Optimization with Oracle Database 11g

Transforming the Data Center with ARM

Transforming Management for Modern Scale-Out Infrastructure

High Availability Distributed (Micro-)services. Clemens Vasters Microsoft

Introduction to National Supercomputing Centre in Guangzhou and Opportunities for International Collaboration

The intelligence of hyper-converged infrastructure. Your Right Mix Solution

MOHA: Many-Task Computing Framework on Hadoop

CSD3 The Cambridge Service for Data Driven Discovery. A New National HPC Service for Data Intensive science

Optimize Your Heterogeneous SOA Infrastructure

Transcription:

Dr. Fabrizio Gagliardi EMEA Director External Research Microsoft Research I3 - Internet - Infrastructures Innovations PSNC, Poznan (PL) November 2009

Most of these slides come from Dennis Gannon, Director and Dan Reed, CVP in MS Extreme Computing Group (XCG), both long time pioneers in HPC and now with Microsoft Research

We are at a flex point in the evolution of distributed computing (nothing new under the sun ) Grid remains a good solution for a reduced number of communities (and often for social/political reasons) Cloud computing and hosted services are emerging as the next incarnation of distributed computing with some obvious additional advantages (think of data centreslocated in Iceland or next to cheap and renewable energy sources) 11/12/2009 GridKA 2008, Karlsruhe 3

MSR Definition: Cloud Computing means using a remote data center to manage scalable, reliable, on-demand access to applications. Scalable means Possibly millions of simultaneous users of the app. Exploiting thousand-fold parallelism in the app. Reliable, on-demand means 5 nines available right now. Applications span the continuum from client to the cloud.

Philosophy: The data center is a computer that must be designed and programmed as an integrated system On-chip FLASH PCM Low power Virtualization Optical Interconnect Multicore Heterogeneity Processors Optics Distributed routing Non-TCP/IP Storage Chip stacking Modularity Liquid cooling Over-provisioned Networks Introspection Tier-splitting Adaptation Resilience Packaging Software

How do you Support email for 375 million users? Store and index 6.75 trillion photos? Support 10 billion web search queries/month? And deliver deliver a quality response in 0.15 seconds to millions of simultaneous users? never go down. The future goes well beyond web search

Experiments Simulations Archives Literature Consumer The Challenge: Enable Discovery. Deliver the capability to mine, search and analyze this data in near real time. Petabytes Doubling every 2 years The Response: A massive private sector build-out of data centers.

Range in size from edge facilities to megascale. Economies of scale Approximate costs for a medium size center (1000 servers) and a large, 50K server center. Technology Cost in Medium-sized Data Center Cost in Very LargeData Center Ratio Network Storage Administration $95 per Mbps/ month $2.20 per GB/ month ~140 servers/ Administrator $13 per Mbps/ month $0.40 per GB/ month >1000 Servers/ Administrator 7.1 5.7 7.1 Each data center is 11.5 times the size of a football field

Conquering complexity. Building racks of servers & complex cooling systems all separately is not efficient. Package and deploy into bigger units:

EPA released a report saying: In 2006 data centers used 61 Terawatt-hours of power Total power bill: $4.5 billion 7 GW peak load (15 power plants) 44.4 million mtco 2 (0.8% emissions) This was 1.5 % of all US electrical energy use. Expected to double by 2011. A new challenge and a green initiative. A deeper look and a few ideas.

Where are the costs? Mid-sized facility (20 containers) Cost of power ($/kwh): $0.07 Cost of facility: $200,000,000 (amortize 15 years) Number of Servers: 50,000 (3 year life) @$2K each Power critical load 15MW Power Usage Effectiveness (PUE) 1.7 Observe: Fully burdened cost of power = power consumed + cost of cooling and power distribution infrastructure As cost of servers drops and power costs rise, power will dominate all other costs. $284 686 $1 042 440 $1 296 902 Monthly Costs $2 997 090 Servers Power & Cooling Infrastructure Power Other Infrastructure 3yr server & 15 yr infrastructure amortization

Data Centers use 1.5% of US electricity $4.5 billion annually 7 GW peak load (15 power plants) 44.4 million mtco 2 (0.8% emissions) Rethink Environmentals Run them in a wider rage of conditions Rethink UPS Christian Belady s In Tent data center experiment. Google s battery per server. Rethink Architecture Intel Atom and power states. Marlowe Project

Cloud Apps connect people to Insight from Information Experience Discovery Most Cloud Apps are immediate, scalable and persistent. The Cloud is also a platform for massive data analysis. Not a replacement for leading edge supercomputers The Programming model must support scalability in two dimensions Thousands of simultaneous users of the same app Apps that require thousands of cores for each use.

Automatic query plan generation Distributed query execution by Dryad LINQ query Query plan Dryad var logentries = from line in logs where!line.startswith("#") select new LogEntry(line); select where logs LINQ:.NET Language Integrated Query Declarative SQL-like programming with C# and Visual Studio Easy expression of data parallelism Elegant and unified data model Source: Yuan Yu et al

Infrastructure as a service Provide a way to host virtual machines on demand Amazon ec2 and S3. you configure your VM, load and go. Application as a service Hadoopand Dryad are application frameworks for data parallel analysis Platform as a services You write an App to cloud APIs and release it. The platform manages and scales it for you. Google App engine: Write a python program to access Big Table. Upload it and run it in a python cloud. 15

Infrastructure as a Service Platform as a Service Software as a Service

Map Reduce-style Parallel Blast Take DNA samples and search for matches Full Metagenomics sample 363,876 records 50 roles 94,320 sec. Speedup = 45. 100 roles 45,000 sec. Speedup = 94. Next Step 1000 roles 20 GB input sample Azure Blob Storage Genome DB 1 BLAST DB Configuration BLAST user selects DBs and input sequence Genome DB K Blast Web Role BLAST Execution Worker Role #1 Input Splitter Worker Role. Combiner Worker Role Basic MapReduce - 2 GB database in each worker role - 500 MB input file. BLAST Execution Worker Role #n

Statistical tool used to analyze DNA of HIV from large studies of infected patients PhyloDwas developed by Microsoft Research and has been highly impactful Small but important group of researchers 100 s of HIV and HepC researchers actively use it 1000 s of research communities rely on these results Cover of PLoS Biology November 2008 Typical job, 10 20 CPU hours with extreme jobs requiring 1K 2K CPU hours Very CPU efficient Requires a large number of test runs for a given job (1 10M tests) Highly compressed data per job ( ~100 KB per job) Highlights Windows Azure s potential for agile deployment of sciencerelated services that scale Courtesy of Roger Barga

We have built pluginsfor Matlabto talk to the cloud Excel Spreadsheet views of Azure Tables We have begun to host Science Data IRIS is a consortium based in Seattle -sponsored by the National Science Foundation to collect and distribute global seismological data Two Petabytes of seismic data collected Data sets requested by researchers worldwide Includes HD videos, seismograms, images, and data from major earthquakes Ocean Observatory Data from UW. NCBI Genomic data More to come

Our Vision Reinvent the data center as a computer Optimize internal practice for industry leading advantage Reduce cost by a factor of four while delivering more performance. Invent and exploit game changing technologies (hardware and software) Change the data center products available to Microsoft that create competitive advantage Drive standards for new technologies Design and build research data center for integrated experiments Anticipate next generation of data center applications

Exploring applications that can drive future data center hardware and software On-demand Face recognition From photo on cell phone to the cloud and back To test the client to cloud application design Collaborative Virtual reality To test scalable networks and cloud rendering Natural Language Translation Realtime voice to voice

Health and Lifestyle Management Exploit ubiquitous sensor data to Monitor my core health Help me watch my special diet Keep me in touch with my family (always-on senses) Personal Information Agents Watch me write and do the background research Do long term planning/problem solving for me. My Robot control center Manage my 1000 robots. Keep track of my smart dust.

Data centresand Cloud computing I was anticipatinin previous talks are now there Several commercial offers Scientific traditional computing solutions (grid, cluster, supercomputers) facing long term sustainability, energy and environmental issues Funding agencies finding increasingly difficult to sustain computing infrastructures for ever Need to develop new business models (pay by use) Virtualisation everywhere Including in funding for scientific computing infrastructures

Thanks to the organizers for the kind invitation and to all of you for your attention Contact me at: Fabrig@ microsoftcom 11/12/2009 GridKA 2008, Karlsruhe 24