Dr. Fabrizio Gagliardi EMEA Director External Research Microsoft Research I3 - Internet - Infrastructures Innovations PSNC, Poznan (PL) November 2009
Most of these slides come from Dennis Gannon, Director and Dan Reed, CVP in MS Extreme Computing Group (XCG), both long time pioneers in HPC and now with Microsoft Research
We are at a flex point in the evolution of distributed computing (nothing new under the sun ) Grid remains a good solution for a reduced number of communities (and often for social/political reasons) Cloud computing and hosted services are emerging as the next incarnation of distributed computing with some obvious additional advantages (think of data centreslocated in Iceland or next to cheap and renewable energy sources) 11/12/2009 GridKA 2008, Karlsruhe 3
MSR Definition: Cloud Computing means using a remote data center to manage scalable, reliable, on-demand access to applications. Scalable means Possibly millions of simultaneous users of the app. Exploiting thousand-fold parallelism in the app. Reliable, on-demand means 5 nines available right now. Applications span the continuum from client to the cloud.
Philosophy: The data center is a computer that must be designed and programmed as an integrated system On-chip FLASH PCM Low power Virtualization Optical Interconnect Multicore Heterogeneity Processors Optics Distributed routing Non-TCP/IP Storage Chip stacking Modularity Liquid cooling Over-provisioned Networks Introspection Tier-splitting Adaptation Resilience Packaging Software
How do you Support email for 375 million users? Store and index 6.75 trillion photos? Support 10 billion web search queries/month? And deliver deliver a quality response in 0.15 seconds to millions of simultaneous users? never go down. The future goes well beyond web search
Experiments Simulations Archives Literature Consumer The Challenge: Enable Discovery. Deliver the capability to mine, search and analyze this data in near real time. Petabytes Doubling every 2 years The Response: A massive private sector build-out of data centers.
Range in size from edge facilities to megascale. Economies of scale Approximate costs for a medium size center (1000 servers) and a large, 50K server center. Technology Cost in Medium-sized Data Center Cost in Very LargeData Center Ratio Network Storage Administration $95 per Mbps/ month $2.20 per GB/ month ~140 servers/ Administrator $13 per Mbps/ month $0.40 per GB/ month >1000 Servers/ Administrator 7.1 5.7 7.1 Each data center is 11.5 times the size of a football field
Conquering complexity. Building racks of servers & complex cooling systems all separately is not efficient. Package and deploy into bigger units:
EPA released a report saying: In 2006 data centers used 61 Terawatt-hours of power Total power bill: $4.5 billion 7 GW peak load (15 power plants) 44.4 million mtco 2 (0.8% emissions) This was 1.5 % of all US electrical energy use. Expected to double by 2011. A new challenge and a green initiative. A deeper look and a few ideas.
Where are the costs? Mid-sized facility (20 containers) Cost of power ($/kwh): $0.07 Cost of facility: $200,000,000 (amortize 15 years) Number of Servers: 50,000 (3 year life) @$2K each Power critical load 15MW Power Usage Effectiveness (PUE) 1.7 Observe: Fully burdened cost of power = power consumed + cost of cooling and power distribution infrastructure As cost of servers drops and power costs rise, power will dominate all other costs. $284 686 $1 042 440 $1 296 902 Monthly Costs $2 997 090 Servers Power & Cooling Infrastructure Power Other Infrastructure 3yr server & 15 yr infrastructure amortization
Data Centers use 1.5% of US electricity $4.5 billion annually 7 GW peak load (15 power plants) 44.4 million mtco 2 (0.8% emissions) Rethink Environmentals Run them in a wider rage of conditions Rethink UPS Christian Belady s In Tent data center experiment. Google s battery per server. Rethink Architecture Intel Atom and power states. Marlowe Project
Cloud Apps connect people to Insight from Information Experience Discovery Most Cloud Apps are immediate, scalable and persistent. The Cloud is also a platform for massive data analysis. Not a replacement for leading edge supercomputers The Programming model must support scalability in two dimensions Thousands of simultaneous users of the same app Apps that require thousands of cores for each use.
Automatic query plan generation Distributed query execution by Dryad LINQ query Query plan Dryad var logentries = from line in logs where!line.startswith("#") select new LogEntry(line); select where logs LINQ:.NET Language Integrated Query Declarative SQL-like programming with C# and Visual Studio Easy expression of data parallelism Elegant and unified data model Source: Yuan Yu et al
Infrastructure as a service Provide a way to host virtual machines on demand Amazon ec2 and S3. you configure your VM, load and go. Application as a service Hadoopand Dryad are application frameworks for data parallel analysis Platform as a services You write an App to cloud APIs and release it. The platform manages and scales it for you. Google App engine: Write a python program to access Big Table. Upload it and run it in a python cloud. 15
Infrastructure as a Service Platform as a Service Software as a Service
Map Reduce-style Parallel Blast Take DNA samples and search for matches Full Metagenomics sample 363,876 records 50 roles 94,320 sec. Speedup = 45. 100 roles 45,000 sec. Speedup = 94. Next Step 1000 roles 20 GB input sample Azure Blob Storage Genome DB 1 BLAST DB Configuration BLAST user selects DBs and input sequence Genome DB K Blast Web Role BLAST Execution Worker Role #1 Input Splitter Worker Role. Combiner Worker Role Basic MapReduce - 2 GB database in each worker role - 500 MB input file. BLAST Execution Worker Role #n
Statistical tool used to analyze DNA of HIV from large studies of infected patients PhyloDwas developed by Microsoft Research and has been highly impactful Small but important group of researchers 100 s of HIV and HepC researchers actively use it 1000 s of research communities rely on these results Cover of PLoS Biology November 2008 Typical job, 10 20 CPU hours with extreme jobs requiring 1K 2K CPU hours Very CPU efficient Requires a large number of test runs for a given job (1 10M tests) Highly compressed data per job ( ~100 KB per job) Highlights Windows Azure s potential for agile deployment of sciencerelated services that scale Courtesy of Roger Barga
We have built pluginsfor Matlabto talk to the cloud Excel Spreadsheet views of Azure Tables We have begun to host Science Data IRIS is a consortium based in Seattle -sponsored by the National Science Foundation to collect and distribute global seismological data Two Petabytes of seismic data collected Data sets requested by researchers worldwide Includes HD videos, seismograms, images, and data from major earthquakes Ocean Observatory Data from UW. NCBI Genomic data More to come
Our Vision Reinvent the data center as a computer Optimize internal practice for industry leading advantage Reduce cost by a factor of four while delivering more performance. Invent and exploit game changing technologies (hardware and software) Change the data center products available to Microsoft that create competitive advantage Drive standards for new technologies Design and build research data center for integrated experiments Anticipate next generation of data center applications
Exploring applications that can drive future data center hardware and software On-demand Face recognition From photo on cell phone to the cloud and back To test the client to cloud application design Collaborative Virtual reality To test scalable networks and cloud rendering Natural Language Translation Realtime voice to voice
Health and Lifestyle Management Exploit ubiquitous sensor data to Monitor my core health Help me watch my special diet Keep me in touch with my family (always-on senses) Personal Information Agents Watch me write and do the background research Do long term planning/problem solving for me. My Robot control center Manage my 1000 robots. Keep track of my smart dust.
Data centresand Cloud computing I was anticipatinin previous talks are now there Several commercial offers Scientific traditional computing solutions (grid, cluster, supercomputers) facing long term sustainability, energy and environmental issues Funding agencies finding increasingly difficult to sustain computing infrastructures for ever Need to develop new business models (pay by use) Virtualisation everywhere Including in funding for scientific computing infrastructures
Thanks to the organizers for the kind invitation and to all of you for your attention Contact me at: Fabrig@ microsoftcom 11/12/2009 GridKA 2008, Karlsruhe 24