Performing Large Science Experiments on Azure: Pitfalls and Solutions

Size: px
Start display at page:

Download "Performing Large Science Experiments on Azure: Pitfalls and Solutions"

Transcription

1 Performing Large Science Experiments on Azure: Pitfalls and Solutions Wei Lu, Jared Jackson, Jaliya Ekanayake, Roger Barga, Nelson Araujo Microsoft extreme Computing Group

2 Windows Azure Application Compute Storage Fabric

3 Suggested Application Model Using queues for reliable messaging To scale, add more of either Web Role Worker Role IIS ASP.NET, WCF, etc. main( { } 4) Do work Decouple the system Absorb the bursts resilient to the instance failure, Easy to scale 2) Put work in queue Queue 3) Get work from queue

4 Azure Queue Communication channel between instances Messages in the Queue is reliable and durable 7-day life time Fault tolerance mechanism De-queued message becomes visible again after visibilitytimeout if it is not deleted 2-hour maximum limitation Idempotent processing Instance Instance Instance

5 AzureBLAST BLAST (Basic Local Alignment Search Tool) the most important software in bioinformatics Identify the similarity between bio-sequences BLAST is highly computation-intensive Large number of pairwise alignment operations The size of sequence databases has been growing exponentially Two choices for running large BLAST jobs Building a local cluster Submit jobs to NCBI or EBI Long job queuing time BLAST is easy to be parallelized Query segmentation Splitting task BLAST task BLAST task BLAST task Merging Task BLAST task

6 AzureBLAST Worker Web Role Job Management Role Web Portal Web Service Job registration Job Scheduler Global dispatch queue Worker Worker NCBI databases Database updating Role Job Registry Azure Table Blast databases, temporary data, etc.) Azure Blob

7 All-by-All BLAST experiment All by All query Compare the database against itself Discovering Homologs inter-relationships of known protein sequences Large protein database (4.2 GB size) Totally 9,865,668 sequences In theory100 billion sequence comparisons! Performance estimation would require 14 CPU-years One of biggest BLAST jobs as far as we know

8 Our Solution Allocated 3776 weighted instances 475 extra-large instances From three datacenters US South Central, West Europe and North Europe Dividing 10 million sequences into several segments Each will be submitted to one datacenter as one job Each segment consists of smaller partitions Finally the job took two weeks Total size of all outputs is ~230GB

9 Understanding Azure by analyzing logs A normal log record should be 3/31/2010 6:14RD00155D3611B0 Executing the task /31/2010 6:25RD00155D3611B0 Execution of task is done, it takes 10.9mins 3/31/2010 6:25RD00155D3611B0 Executing the task /31/2010 6:44RD00155D3611B0 Execution of task is done, it takes 19.3mins 3/31/2010 6:44RD00155D3611B0 Executing the task /31/2010 7:02RD00155D3611B0 Execution of task is done, it takes mins Otherwise, something is wrong (e.g., lost task) 3/31/2010 8:22RD00155D3611B0 Executing the task /31/2010 9:50RD00155D3611B0 Executing the task /31/ :12RD00155D3611B0 Execution of task is done, it takes 82 mins

10 Challenges & Pitfalls Failures Instance Idle time Limitation of current Azure Queue Performance/Cost Estimation Minimizing the Needs for Programming

11 Case Study 1 North Europe datacenter, totally 34, 265 tasks processed Node replacement, Avoid using machine name in your program Almost one day delay. Try not to orchestrate instances by the tight synchronization (e.g., barrier)

12 Case Study 2 North Europe Data Center, totally 34,256 tasks processed All 62 nodes lost tasks and then came back in a group fashion. This is Update domain ~ 6 nodes in one group ~30 mins

13 Case Study 3 West Europe Datacenter; 30,976 tasks are completed, and job was killed 35 Nodes experienced the blob writing failure at same time A reasonable guess: the Fault Domain is working

14 Challenges & Pitfalls Failures Failures are expectable and unpredictable Design with failure in mind Most are automatically recovered by cloud Instance Idle time Limitation of current Azure Queue Performance/Cost Estimation Minimizing the Needs for Programming

15 Challenges & Pitfalls Failures Instance Idle time Gap time between two jobs Diversity of work load Load imbalance Limitation of current Azure Queue Performance/Cost Estimation Minimizing the Needs for Programming

16 Load imbalance North Europe Data center, 2058 tasks Two-day very low system throughput due to some long-tail tasks Task needs 8 hours to complete; it was re-executed by 8 nodes due to the 2-hour max value of the visibliblitytimeout of a message

17 Challenges & Pitfalls Failures Instance Idle time Limitation of current Azure Queue 2-hour max value of visibilitytimeout Each individual task has to be done in 2 hours 7-day max message life time Entire experiment has to be done in less then 7 days Performance/Cost Estimation Minimizing the Needs for Programming

18 Challenges & Pitfalls Failures Instance Idle time Limitation of current Azure Queue Performance/Cost Estimation The better you understand your application, the more money you can save BLAST has about 20 arguments VM size Minimizing the Needs for Programming

19 Cirrus: Parameter Sweeping Service on Azure Worker Web Role Job Manager Role Web Portal Web Service Job registration Job Scheduler Scaling Engine Parametric Engine Sampling Filter Dispatch Queue Worker Worker Azure Table Azure Blob

20 Job Manager Role Job Definition Job Scheduler Scaling Engine Parametric Engine Sampling Filter Declarative Job definition Derived from Nimrod Each job can have Prolog Commands Paramters Azure-related opeartors AzureCopy AzureMount SelectBlobs Job configuration Minimize the programming for running legacy binaries on Azure BLAST Bayesian Network Machine Learning Image rendering <job name="blast"> <prolog> azurecopy uniref.fasta </prolog> <cmd> azurecopy %partition% input blastall.exe -p blastp -d uniref.fasta -i input -o output azurecopy output %partition%.out </cmd> <parameter name="partition"> <selectblobs> <prefix>partitions/</prefix> </selectblobs> </parameter> <configure> <mininstances>2</mininstances> <maxinstances>4</maxinstances> <shutdownwhendone> true </shutdownwhendone> <sampling> true </sampling> </configure> </job>

21 Job Manager Role Dynamic ScalingJob Scheduler Scaling Engine Parametric Engine Sampling Filter Scaling in/out for individual job Fit into the [min, max] window specified in the job config Synchronous Scaling Tasks are dispatched after the scaling is done Asynchronous Scaling Tasks execution and scaling operation are simultaneous Scaling in when load imbalance happens Scaling in when not receiving new jobs after a period of time Or if the job is configured as shutdown-when-done Usually used for the reducing job.

22 Job Pause-ReConfig-Resume Each job maintains a take status table Checkpoint by snapshotting the task table A task can be incomplete Fix the 7-day/ 2-hour limitation Handle the exception optimistically Ignore the exceptions, retry incomplete tasks with reduced number of instance, minimize the cost of failures Handle the load imbalance

23 Performance Estimation by Sampling Observation based approach Job Manager Role Job Scheduler Scaling Engine Parametric Engine Randomly sample the parameter space based on the sampling ration a Only dispatch the sample tasks scaling in only with n instances to save cost Assuming the uniform distribution, the estimation is done by Sampling Filter

24 Evaluation A complete BLAST running takes 2 hours with 16 instances, a 2%-sampling-run which achieves 96% accuracy only takes about 18 minutes with 2 instances the overall cost for the sampling run is only 1.8% of the complete run.

25 Evaluation Scaling-out Sync. Operation stall all instances for 80 minutes Async. Operation, Existing instances keep working New instances needs minutes 16-instance run is 1.4x faster Scaling-in Sync. Operation finished in 3 minutes Async. Operation caused the random message losing May lead to more idle instance time. the best practices scale-out asynchronously Scale-in synchronously New instances join in minutes Azure randomly picks the instances to shutdown

26 Conclusion Running large-scale parameter sweeping experiment on Azure Identified Pitfalls Design with Failure (most of them are recoverable) Watch out the instance idle time understand your application to save cost Minimize the need of programming Our parameter sweeping solutions Declarative job definition Dynamic scaling, Job pause-reconfig-resume pattern Performance estimation

Introduction to Windows Azure Cloud Computing Futures Group, Microsoft Research Roger Barga, Jared Jackson, Nelson Araujo, Dennis Gannon, Wei Lu, and

Introduction to Windows Azure Cloud Computing Futures Group, Microsoft Research Roger Barga, Jared Jackson, Nelson Araujo, Dennis Gannon, Wei Lu, and Introduction to Windows Azure Cloud Computing Futures Group, Microsoft Research Roger Barga, Jared Jackson, Nelson Araujo, Dennis Gannon, Wei Lu, and Jaliya Ekanayake Range in size from edge facilities

More information

Loosely coupled: asynchronous processing, decoupling of tiers/components Fan-out the application tiers to support the workload Use cache for data and content Reduce number of requests if possible Batch

More information

WINDOWS AZURE QUEUE. Table of Contents. 1 Introduction

WINDOWS AZURE QUEUE. Table of Contents. 1 Introduction WINDOWS AZURE QUEUE December, 2008 Table of Contents 1 Introduction... 1 2 Build Cloud Applications with Azure Queue... 2 3 Data Model... 5 4 Queue REST Interface... 6 5 Queue Usage Example... 7 5.1 A

More information

Windows Azure Services - At Different Levels

Windows Azure Services - At Different Levels Windows Azure Windows Azure Services - At Different Levels SaaS eg : MS Office 365 Paas eg : Azure SQL Database, Azure websites, Azure Content Delivery Network (CDN), Azure BizTalk Services, and Azure

More information

escience in the Cloud: A MODIS Satellite Data Reprojection and Reduction Pipeline in the Windows

escience in the Cloud: A MODIS Satellite Data Reprojection and Reduction Pipeline in the Windows escience in the Cloud: A MODIS Satellite Data Reprojection and Reduction Pipeline in the Windows Jie Li1, Deb Agarwal2, Azure Marty Platform Humphrey1, Keith Jackson2, Catharine van Ingen3, Youngryel Ryu4

More information

COMP6511A: Large-Scale Distributed Systems. Windows Azure. Lin Gu. Hong Kong University of Science and Technology Spring, 2014

COMP6511A: Large-Scale Distributed Systems. Windows Azure. Lin Gu. Hong Kong University of Science and Technology Spring, 2014 COMP6511A: Large-Scale Distributed Systems Windows Azure Lin Gu Hong Kong University of Science and Technology Spring, 2014 Cloud Systems Infrastructure as a (IaaS): basic compute and storage resources

More information

Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications

Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications Thilina Gunarathne, Tak-Lon Wu Judy Qiu, Geoffrey Fox School of Informatics, Pervasive Technology Institute Indiana University

More information

Developing Microsoft Azure Solutions

Developing Microsoft Azure Solutions Course 20532C: Developing Microsoft Azure Solutions Course details Course Outline Module 1: OVERVIEW OF THE MICROSOFT AZURE PLATFORM This module reviews the services available in the Azure platform and

More information

FLAT DATACENTER STORAGE CHANDNI MODI (FN8692)

FLAT DATACENTER STORAGE CHANDNI MODI (FN8692) FLAT DATACENTER STORAGE CHANDNI MODI (FN8692) OUTLINE Flat datacenter storage Deterministic data placement in fds Metadata properties of fds Per-blob metadata in fds Dynamic Work Allocation in fds Replication

More information

Yogesh Simmhan. escience Group Microsoft Research

Yogesh Simmhan. escience Group Microsoft Research External Research Yogesh Simmhan Group Microsoft Research Catharine van Ingen, Roger Barga, Microsoft Research Alex Szalay, Johns Hopkins University Jim Heasley, University of Hawaii Science is producing

More information

Developing Microsoft Azure Solutions: Course Agenda

Developing Microsoft Azure Solutions: Course Agenda Developing Microsoft Azure Solutions: 70-532 Course Agenda Module 1: Overview of the Microsoft Azure Platform Microsoft Azure provides a collection of services that you can use as building blocks for your

More information

Course Outline. Lesson 2, Azure Portals, describes the two current portals that are available for managing Azure subscriptions and services.

Course Outline. Lesson 2, Azure Portals, describes the two current portals that are available for managing Azure subscriptions and services. Course Outline Module 1: Overview of the Microsoft Azure Platform Microsoft Azure provides a collection of services that you can use as building blocks for your cloud applications. Lesson 1, Azure Services,

More information

Exam Questions

Exam Questions Exam Questions 70-475 Designing and Implementing Big Data Analytics Solutions https://www.2passeasy.com/dumps/70-475/ 1. Drag and Drop You need to recommend data storage mechanisms for the solution. What

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models Piccolo: Building Fast, Distributed Programs

More information

Course Outline. Introduction to Azure for Developers Course 10978A: 5 days Instructor Led

Course Outline. Introduction to Azure for Developers Course 10978A: 5 days Instructor Led Introduction to Azure for Developers Course 10978A: 5 days Instructor Led About this course This course offers students the opportunity to take an existing ASP.NET MVC application and expand its functionality

More information

Distributed Systems. Tutorial 9 Windows Azure Storage

Distributed Systems. Tutorial 9 Windows Azure Storage Distributed Systems Tutorial 9 Windows Azure Storage written by Alex Libov Based on SOSP 2011 presentation winter semester, 2011-2012 Windows Azure Storage (WAS) A scalable cloud storage system In production

More information

Course Outline. Developing Microsoft Azure Solutions Course 20532C: 4 days Instructor Led

Course Outline. Developing Microsoft Azure Solutions Course 20532C: 4 days Instructor Led Developing Microsoft Azure Solutions Course 20532C: 4 days Instructor Led About this course This course is intended for students who have experience building ASP.NET and C# applications. Students will

More information

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568 FLAT DATACENTER STORAGE Paper-3 Presenter-Pratik Bhatt fx6568 FDS Main discussion points A cluster storage system Stores giant "blobs" - 128-bit ID, multi-megabyte content Clients and servers connected

More information

Azure-persistence MARTIN MUDRA

Azure-persistence MARTIN MUDRA Azure-persistence MARTIN MUDRA Storage service access Blobs Queues Tables Storage service Horizontally scalable Zone Redundancy Accounts Based on Uri Pricing Calculator Azure table storage Storage Account

More information

XLDB 11 Cloud Computing at Scale. Roger Barga Microsoft Research

XLDB 11 Cloud Computing at Scale. Roger Barga Microsoft Research XLDB 11 Cloud Computing at Scale Roger Barga Microsoft Research Framing Questions for Presentation(s) Does it make sense for large-scale (many terabytes, petabytes), data-intensive projects to consider

More information

Putting it together. Data-Parallel Computation. Ex: Word count using partial aggregation. Big Data Processing. COS 418: Distributed Systems Lecture 21

Putting it together. Data-Parallel Computation. Ex: Word count using partial aggregation. Big Data Processing. COS 418: Distributed Systems Lecture 21 Big Processing -Parallel Computation COS 418: Distributed Systems Lecture 21 Michael Freedman 2 Ex: Word count using partial aggregation Putting it together 1. Compute word counts from individual files

More information

Microsoft Developing Microsoft Azure Solutions.

Microsoft Developing Microsoft Azure Solutions. http://www.officialcerts.com 70-532 Microsoft Developing Microsoft Azure Solutions OfficialCerts.com is a reputable IT certification examination guide, study guides and audio exam provider. We ensure that

More information

Microsoft Windows HPC Server 2008 R2 for the Cluster Developer

Microsoft Windows HPC Server 2008 R2 for the Cluster Developer 50291B - Version: 1 02 May 2018 Microsoft Windows HPC Server 2008 R2 for the Cluster Developer Microsoft Windows HPC Server 2008 R2 for the Cluster Developer 50291B - Version: 1 5 days Course Description:

More information

AZURE CONTAINER INSTANCES

AZURE CONTAINER INSTANCES AZURE CONTAINER INSTANCES -Krunal Trivedi ABSTRACT In this article, I am going to explain what are Azure Container Instances, how you can use them for hosting, when you can use them and what are its features.

More information

FuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc

FuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc Fuxi Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc {jiamang.wang, yongjun.wyj, hua.caihua, zhipeng.tzp, zhiqiang.lv,

More information

Users Application Virtual Machine Users Application Virtual Machine Users Application Virtual Machine Private Cloud Users Application Virtual Machine On-Premise Service Providers Private Cloud Users Application

More information

Microsoft_PrepKing_70-583_v _85q_By-Cath. if u wana pass the exam with good percentage dn follow this dump

Microsoft_PrepKing_70-583_v _85q_By-Cath. if u wana pass the exam with good percentage dn follow this dump Microsoft_PrepKing_70-583_v2011-11-25_85q_By-Cath Number: 70-583 Passing Score: 800 Time Limit: 120 min File Version: 2011-11-25 http://www.gratisexam.com/ Exam : Microsoft_PrepKing_70-583 Ver :2011-11-25

More information

ACCURATE STUDY GUIDES, HIGH PASSING RATE! Question & Answer. Dump Step. provides update free of charge in one year!

ACCURATE STUDY GUIDES, HIGH PASSING RATE! Question & Answer. Dump Step. provides update free of charge in one year! DUMP STEP Question & Answer ACCURATE STUDY GUIDES, HIGH PASSING RATE! Dump Step provides update free of charge in one year! http://www.dumpstep.com Exam : 70-532 Title : Developing Microsoft Azure Solutions

More information

Developing Microsoft Azure Solutions

Developing Microsoft Azure Solutions Developing Microsoft Azure Solutions Duration: 5 Days Course Code: M20532 Overview: This course is intended for students who have experience building web applications. Students should also have experience

More information

MapReduce for Data Intensive Scientific Analyses

MapReduce for Data Intensive Scientific Analyses apreduce for Data Intensive Scientific Analyses Jaliya Ekanayake Shrideep Pallickara Geoffrey Fox Department of Computer Science Indiana University Bloomington, IN, 47405 5/11/2009 Jaliya Ekanayake 1 Presentation

More information

ebay s Architectural Principles

ebay s Architectural Principles ebay s Architectural Principles Architectural Strategies, Patterns, and Forces for Scaling a Large ecommerce Site Randy Shoup ebay Distinguished Architect QCon London 2008 March 14, 2008 What we re up

More information

Patterns on XRegional Data Consistency

Patterns on XRegional Data Consistency Patterns on XRegional Data Consistency Contents The problem... 3 Introducing XRegional... 3 The solution... 5 Enabling consistency... 6 The XRegional Framework: A closer look... 8 Some considerations...

More information

CLUSTERING HIVEMQ. Building highly available, horizontally scalable MQTT Broker Clusters

CLUSTERING HIVEMQ. Building highly available, horizontally scalable MQTT Broker Clusters CLUSTERING HIVEMQ Building highly available, horizontally scalable MQTT Broker Clusters 12/2016 About this document MQTT is based on a publish/subscribe architecture that decouples MQTT clients and uses

More information

Vlad Vinogradsky

Vlad Vinogradsky Vlad Vinogradsky vladvino@microsoft.com http://twitter.com/vladvino Commercially available cloud platform offering Billing starts on 02/01/2010 A set of cloud computing services Services can be used together

More information

Techno Expert Solutions

Techno Expert Solutions Course Content of Microsoft Windows Azzure Developer: Course Outline Module 1: Overview of the Microsoft Azure Platform Microsoft Azure provides a collection of services that you can use as building blocks

More information

Developing Microsoft Azure Solutions (MS 20532)

Developing Microsoft Azure Solutions (MS 20532) Developing Microsoft Azure Solutions (MS 20532) COURSE OVERVIEW: This course is intended for students who have experience building ASP.NET and C# applications. Students will also have experience with the

More information

Adaptive Cluster Computing using JavaSpaces

Adaptive Cluster Computing using JavaSpaces Adaptive Cluster Computing using JavaSpaces Jyoti Batheja and Manish Parashar The Applied Software Systems Lab. ECE Department, Rutgers University Outline Background Introduction Related Work Summary of

More information

Most real programs operate somewhere between task and data parallelism. Our solution also lies in this set.

Most real programs operate somewhere between task and data parallelism. Our solution also lies in this set. for Windows Azure and HPC Cluster 1. Introduction In parallel computing systems computations are executed simultaneously, wholly or in part. This approach is based on the partitioning of a big task into

More information

ebay Marketplace Architecture

ebay Marketplace Architecture ebay Marketplace Architecture Architectural Strategies, Patterns, and Forces Randy Shoup, ebay Distinguished Architect QCon SF 2007 November 9, 2007 What we re up against ebay manages Over 248,000,000

More information

Datacenter replication solution with quasardb

Datacenter replication solution with quasardb Datacenter replication solution with quasardb Technical positioning paper April 2017 Release v1.3 www.quasardb.net Contact: sales@quasardb.net Quasardb A datacenter survival guide quasardb INTRODUCTION

More information

20532D: Developing Microsoft Azure Solutions

20532D: Developing Microsoft Azure Solutions 20532D: Developing Microsoft Azure Solutions Course Details Course Code: Duration: Notes: 20532D 5 days Elements of this syllabus are subject to change. About this course This course is intended for students

More information

EMC RecoverPoint. EMC RecoverPoint Support

EMC RecoverPoint. EMC RecoverPoint Support Support, page 1 Adding an Account, page 2 RecoverPoint Appliance Clusters, page 3 Replication Through Consistency Groups, page 4 Group Sets, page 22 System Tasks, page 24 Support protects storage array

More information

Distributed ETL. A lightweight, pluggable, and scalable ingestion service for real-time data. Joe Wang

Distributed ETL. A lightweight, pluggable, and scalable ingestion service for real-time data. Joe Wang A lightweight, pluggable, and scalable ingestion service for real-time data ABSTRACT This paper provides the motivation, implementation details, and evaluation of a lightweight distributed extract-transform-load

More information

PERFORMANCE OPTIMIZATION FOR LARGE SCALE LOGISTICS ERP SYSTEM

PERFORMANCE OPTIMIZATION FOR LARGE SCALE LOGISTICS ERP SYSTEM PERFORMANCE OPTIMIZATION FOR LARGE SCALE LOGISTICS ERP SYSTEM Santosh Kangane Persistent Systems Ltd. Pune, India September 2013 Computer Measurement Group, India 1 Logistic System Overview 0.5 millions

More information

microsoft. Number: Passing Score: 800 Time Limit: 120 min.

microsoft.  Number: Passing Score: 800 Time Limit: 120 min. 70-534 microsoft Number: 70-534 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Drag and Drop Question You need to recommend data storage mechanisms for the solution. What should you recommend?

More information

Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds. Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng

Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds. Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng Virtual Clusters on Cloud } Private cluster on public cloud } Distributed

More information

Cohesity Microsoft Azure Data Box Integration

Cohesity Microsoft Azure Data Box Integration Cohesity Microsoft Azure Data Box Integration Table of Contents Introduction...2 Audience...2 Requirements...2 Assumptions...2 Order Microsoft Azure Data Box...3 Requesting...3 Order Details...4 Shipping

More information

Large-scale cluster management at Google with Borg

Large-scale cluster management at Google with Borg Large-scale cluster management at Google with Borg Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, John Wilkes Google Inc. Slides heavily derived from John Wilkes s presentation

More information

Apache Flink. Alessandro Margara

Apache Flink. Alessandro Margara Apache Flink Alessandro Margara alessandro.margara@polimi.it http://home.deib.polimi.it/margara Recap: scenario Big Data Volume and velocity Process large volumes of data possibly produced at high rate

More information

The MapReduce Abstraction

The MapReduce Abstraction The MapReduce Abstraction Parallel Computing at Google Leverages multiple technologies to simplify large-scale parallel computations Proprietary computing clusters Map/Reduce software library Lots of other

More information

<Hot>Table 1.1 lists the Infoblox vnios for Azure appliance models that are supported for this release. # of vcpu Cores. TE-V Yes

<Hot>Table 1.1 lists the Infoblox vnios for Azure appliance models that are supported for this release. # of vcpu Cores. TE-V Yes About Infoblox vnios for Azure Infoblox vnios for Azure is an Infoblox virtual appliance designed for deployments through Microsoft Azure, a collection of integrated cloud services in the Microsoft Cloud.

More information

Yves Goeleven. Solution Architect - Particular Software. Shipping software since Azure MVP since Co-founder & board member AZUG

Yves Goeleven. Solution Architect - Particular Software. Shipping software since Azure MVP since Co-founder & board member AZUG Storage Services Yves Goeleven Solution Architect - Particular Software Shipping software since 2001 Azure MVP since 2010 Co-founder & board member AZUG NServiceBus & MessageHandler Used azure storage?

More information

Rocksteady: Fast Migration for Low-Latency In-memory Storage. Chinmay Kulkarni, Aniraj Kesavan, Tian Zhang, Robert Ricci, Ryan Stutsman

Rocksteady: Fast Migration for Low-Latency In-memory Storage. Chinmay Kulkarni, Aniraj Kesavan, Tian Zhang, Robert Ricci, Ryan Stutsman Rocksteady: Fast Migration for Low-Latency In-memory Storage Chinmay Kulkarni, niraj Kesavan, Tian Zhang, Robert Ricci, Ryan Stutsman 1 Introduction Distributed low-latency in-memory key-value stores are

More information

The Stream Processor as a Database. Ufuk

The Stream Processor as a Database. Ufuk The Stream Processor as a Database Ufuk Celebi @iamuce Realtime Counts and Aggregates The (Classic) Use Case 2 (Real-)Time Series Statistics Stream of Events Real-time Statistics 3 The Architecture collect

More information

Distributed Systems 27. Process Migration & Allocation

Distributed Systems 27. Process Migration & Allocation Distributed Systems 27. Process Migration & Allocation Paul Krzyzanowski pxk@cs.rutgers.edu 12/16/2011 1 Processor allocation Easy with multiprocessor systems Every processor has access to the same memory

More information

Map-Reduce. Marco Mura 2010 March, 31th

Map-Reduce. Marco Mura 2010 March, 31th Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of

More information

Pimp My Data Grid. Brian Oliver Senior Principal Solutions Architect <Insert Picture Here>

Pimp My Data Grid. Brian Oliver Senior Principal Solutions Architect <Insert Picture Here> Pimp My Data Grid Brian Oliver Senior Principal Solutions Architect (brian.oliver@oracle.com) Oracle Coherence Oracle Fusion Middleware Agenda An Architectural Challenge Enter the

More information

Qualys Cloud Platform

Qualys Cloud Platform 18 QUALYS SECURITY CONFERENCE 2018 Qualys Cloud Platform Looking Under the Hood: What Makes Our Cloud Platform so Scalable and Powerful Dilip Bachwani Vice President, Engineering, Qualys, Inc. Cloud Platform

More information

Batches and Commands. Overview CHAPTER

Batches and Commands. Overview CHAPTER CHAPTER 4 This chapter provides an overview of batches and the commands contained in the batch. This chapter has the following sections: Overview, page 4-1 Batch Rules, page 4-2 Identifying a Batch, page

More information

B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2

B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2 Introduction :- Today single CPU based architecture is not capable enough for the modern database that are required to handle more demanding and complex requirements of the users, for example, high performance,

More information

Speeding up the execution of numerical computations and simulations with rcuda José Duato

Speeding up the execution of numerical computations and simulations with rcuda José Duato Speeding up the execution of numerical computations and simulations with rcuda José Duato Universidad Politécnica de Valencia Spain Outline 1. Introduction to GPU computing 2. What is remote GPU virtualization?

More information

SAND: A Fault-Tolerant Streaming Architecture for Network Traffic Analytics

SAND: A Fault-Tolerant Streaming Architecture for Network Traffic Analytics 1 SAND: A Fault-Tolerant Streaming Architecture for Network Traffic Analytics Qin Liu, John C.S. Lui 1 Cheng He, Lujia Pan, Wei Fan, Yunlong Shi 2 1 The Chinese University of Hong Kong 2 Huawei Noah s

More information

Distributed Systems. Day 3: Principles Continued Jan 31, 2019

Distributed Systems. Day 3: Principles Continued Jan 31, 2019 Distributed Systems Day 3: Principles Continued Jan 31, 2019 Semantic Guarantees of RPCs Semantics At-least-once (1 or more calls) At-most-once (0 or 1 calls) Scenarios: Reading from bank account? Withdrawing

More information

Introduction to Grid Computing

Introduction to Grid Computing Milestone 2 Include the names of the papers You only have a page be selective about what you include Be specific; summarize the authors contributions, not just what the paper is about. You might be able

More information

High Availability & Disaster Recovery. Witt Mathot

High Availability & Disaster Recovery. Witt Mathot High Availability & Disaster Recovery Witt Mathot Managing the Twin Risks to your Operations Data Loss Down Time Business Continuity Terminology Resiliency High Availability RTO Round Robin Cost Business

More information

Distributed and Fault-Tolerant Execution Framework for Transaction Processing

Distributed and Fault-Tolerant Execution Framework for Transaction Processing Distributed and Fault-Tolerant Execution Framework for Transaction Processing May 30, 2011 Toshio Suganuma, Akira Koseki, Kazuaki Ishizaki, Yohei Ueda, Ken Mizuno, Daniel Silva *, Hideaki Komatsu, Toshio

More information

Synergetics-Standard-SQL Server 2012-DBA-7 day Contents

Synergetics-Standard-SQL Server 2012-DBA-7 day Contents Workshop Name Duration Objective Participants Entry Profile Training Methodology Setup Requirements Hardware and Software Requirements Training Lab Requirements Synergetics-Standard-SQL Server 2012-DBA-7

More information

Vendor: Microsoft. Exam Code: Exam Name: Developing Microsoft Azure Solutions. Version: Demo

Vendor: Microsoft. Exam Code: Exam Name: Developing Microsoft Azure Solutions. Version: Demo Vendor: Microsoft Exam Code: 70-532 Exam Name: Developing Microsoft Azure Solutions Version: Demo Testlet 1 Topic 1, Web-based Solution Background You are developing a web-based solution that students

More information

Ambry: LinkedIn s Scalable Geo- Distributed Object Store

Ambry: LinkedIn s Scalable Geo- Distributed Object Store Ambry: LinkedIn s Scalable Geo- Distributed Object Store Shadi A. Noghabi *, Sriram Subramanian +, Priyesh Narayanan +, Sivabalan Narayanan +, Gopalakrishna Holla +, Mammad Zadeh +, Tianwei Li +, Indranil

More information

Pivotal Greenplum Database Azure Marketplace v4.0 Release Notes

Pivotal Greenplum Database Azure Marketplace v4.0 Release Notes Pivotal Greenplum Database Azure Marketplace v4.0 Release Notes Updated: February 2019 Overview Pivotal Greenplum is deployed on Azure using an Azure Resource Manager (ARM) template that has been optimized

More information

Azure Development Course

Azure Development Course Azure Development Course About This Course This section provides a brief description of the course, audience, suggested prerequisites, and course objectives. COURSE DESCRIPTION This course is intended

More information

ITBraindumps. Latest IT Braindumps study guide

ITBraindumps.  Latest IT Braindumps study guide ITBraindumps Latest IT Braindumps study guide Exam : 70-535 Title : Architecting Microsoft Azure Solutions Vendor : Microsoft Version : DEMO Get Latest & Valid 70-535 Exam's Question and Answers 1 from

More information

Storm. Distributed and fault-tolerant realtime computation. Nathan Marz Twitter

Storm. Distributed and fault-tolerant realtime computation. Nathan Marz Twitter Storm Distributed and fault-tolerant realtime computation Nathan Marz Twitter Storm at Twitter Twitter Web Analytics Before Storm Queues Workers Example (simplified) Example Workers schemify tweets and

More information

Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand

Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Qi Gao, Weikuan Yu, Wei Huang, Dhabaleswar K. Panda Network-Based Computing Laboratory Department of Computer Science & Engineering

More information

Windows Azure Overview

Windows Azure Overview Windows Azure Overview Christine Collet, Genoveva Vargas-Solar Grenoble INP, France MS Azure Educator Grant Packaged Software Infrastructure (as a Service) Platform (as a Service) Software (as a Service)

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system

More information

Tackling Latency via Replication in Distributed Systems

Tackling Latency via Replication in Distributed Systems Tackling Latency via Replication in Distributed Systems Zhan Qiu, Imperial College London Juan F. Pe rez, University of Melbourne Peter G. Harrison, Imperial College London ACM/SPEC ICPE 2016 15 th March,

More information

Introduction to K2View Fabric

Introduction to K2View Fabric Introduction to K2View Fabric 1 Introduction to K2View Fabric Overview In every industry, the amount of data being created and consumed on a daily basis is growing exponentially. Enterprises are struggling

More information

Developing Microsoft Azure Solutions (70-532) Syllabus

Developing Microsoft Azure Solutions (70-532) Syllabus Developing Microsoft Azure Solutions (70-532) Syllabus Cloud Computing Introduction What is Cloud Computing Cloud Characteristics Cloud Computing Service Models Deployment Models in Cloud Computing Advantages

More information

The Google File System

The Google File System October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single

More information

PLEXXI HCN FOR VMWARE ENVIRONMENTS

PLEXXI HCN FOR VMWARE ENVIRONMENTS PLEXXI HCN FOR VMWARE ENVIRONMENTS SOLUTION BRIEF FEATURING Plexxi s pre-built, VMware Integration Pack makes Plexxi integration with VMware simple and straightforward. Fully-automated network configuration,

More information

Forget about the Clouds, Shoot for the MOON

Forget about the Clouds, Shoot for the MOON Forget about the Clouds, Shoot for the MOON Wu FENG feng@cs.vt.edu Dept. of Computer Science Dept. of Electrical & Computer Engineering Virginia Bioinformatics Institute September 2012, W. Feng Motivation

More information

Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, L., Leiser, N., Czjkowski, G.

Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, L., Leiser, N., Czjkowski, G. Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, L., Leiser, N., Czjkowski, G. Speaker: Chong Li Department: Applied Health Science Program: Master of Health Informatics 1 Term

More information

CSCI 204 Introduction to Computer Science II Lab 7 Queue ADT

CSCI 204 Introduction to Computer Science II Lab 7 Queue ADT CSCI 204 Introduction to Computer Science II Lab 7 Queue ADT 1. Objectives In this lab, you will practice the following: Implement the Queue ADT using a structure of your choice, e.g., array or linked

More information

INTRODUCTION TO NEXTFLOW

INTRODUCTION TO NEXTFLOW INTRODUCTION TO NEXTFLOW Paolo Di Tommaso, CRG NETTAB workshop - Roma October 25th, 2016 @PaoloDiTommaso Research software engineer Comparative Bioinformatics, Notredame Lab Center for Genomic Regulation

More information

Users and utilization of CERIT-SC infrastructure

Users and utilization of CERIT-SC infrastructure Users and utilization of CERIT-SC infrastructure Equipment CERIT-SC is an integral part of the national e-infrastructure operated by CESNET, and it leverages many of its services (e.g. management of user

More information

Linear Regression Optimization

Linear Regression Optimization Gradient Descent Linear Regression Optimization Goal: Find w that minimizes f(w) f(w) = Xw y 2 2 Closed form solution exists Gradient Descent is iterative (Intuition: go downhill!) n w * w Scalar objective:

More information

Parallel Computing: MapReduce Jin, Hai

Parallel Computing: MapReduce Jin, Hai Parallel Computing: MapReduce Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology ! MapReduce is a distributed/parallel computing framework introduced by Google

More information

Serverless Computing: Design, Implementation, and Performance. Garrett McGrath and Paul R. Brenner

Serverless Computing: Design, Implementation, and Performance. Garrett McGrath and Paul R. Brenner Serverless Computing: Design, Implementation, and Performance Garrett McGrath and Paul R. Brenner Introduction Serverless Computing Explosion in popularity over the past 3 years Offerings from all leading

More information

Programming model and implementation for processing and. Programs can be automatically parallelized and executed on a large cluster of machines

Programming model and implementation for processing and. Programs can be automatically parallelized and executed on a large cluster of machines A programming model in Cloud: MapReduce Programming model and implementation for processing and generating large data sets Users specify a map function to generate a set of intermediate key/value pairs

More information

Deccansoft Software Services

Deccansoft Software Services Azure Syllabus Cloud Computing What is Cloud Computing Cloud Characteristics Cloud Computing Service Models Deployment Models in Cloud Computing Advantages and Disadvantages of Cloud Computing Getting

More information

Tasks. Task Implementation and management

Tasks. Task Implementation and management Tasks Task Implementation and management Tasks Vocab Absolute time - real world time Relative time - time referenced to some event Interval - any slice of time characterized by start & end times Duration

More information

BERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

BERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved BERLIN 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Amazon Aurora: Amazon s New Relational Database Engine Carlos Conde Technology Evangelist @caarlco 2015, Amazon Web Services,

More information

Step-by-Step Guide to Installing Cluster Service

Step-by-Step Guide to Installing Cluster Service Page 1 of 23 TechNet Home > Products & Technologies > Windows 2000 Server > Deploy > Configure Specific Features Step-by-Step Guide to Installing Cluster Service Topics on this Page Introduction Checklists

More information

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897 Flat Datacenter Storage Edmund B. Nightingale, Jeremy Elson, et al. 6.S897 Motivation Imagine a world with flat data storage Simple, Centralized, and easy to program Unfortunately, datacenter networks

More information

Developing In The Cloud

Developing In The Cloud Developing In The Cloud What is the Cloud? How does it work? What is P&P doing to help? What Is The Cloud? Cloud computing is a model for enabling Cloud convenient, computingon-demand is the provision

More information

Chapter 5: CPU Scheduling

Chapter 5: CPU Scheduling Chapter 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Thread Scheduling Multiple-Processor Scheduling Operating Systems Examples Algorithm Evaluation Chapter 5: CPU Scheduling

More information

Scalable Parallel Scientific Computing Using Twister4Azure

Scalable Parallel Scientific Computing Using Twister4Azure Scalable Parallel Scientific Computing Using Twister4Azure Thilina Gunarathne, Bingjing Zhang, Tak-Lon Wu, Judy Qiu School of Informatics and Computing Indiana University, Bloomington. {tgunarat, zhangbj,

More information

GFS: The Google File System. Dr. Yingwu Zhu

GFS: The Google File System. Dr. Yingwu Zhu GFS: The Google File System Dr. Yingwu Zhu Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one big CPU More storage, CPU required than one PC can

More information

Vendor: Microsoft. Exam Code: Exam Name: Developing Microsoft Azure Solutions. Version: Demo

Vendor: Microsoft. Exam Code: Exam Name: Developing Microsoft Azure Solutions. Version: Demo Vendor: Microsoft Exam Code: 70-532 Exam Name: Developing Microsoft Azure Solutions Version: Demo DEMO QUESTION 1 You need to configure storage for the solution. What should you do? To answer, drag the

More information