BigDataBench-MT: Multi-tenancy version of BigDataBench

Similar documents
Multi-tenancy version of BigDataBench

Statistics Driven Workload Modeling for the Cloud

How to use BigDataBench workloads and data sets

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Mbench: Benchmarking a Multicore Operating System Using Mixed Workloads

How Data Volume Affects Spark Based Data Analytics on a Scale-up Server

RACKSPACE ONMETAL I/O V2 OUTPERFORMS AMAZON EC2 BY UP TO 2X IN BENCHMARK TESTING

Consolidating Complementary VMs with Spatial/Temporalawareness

Next-Generation Cloud Platform

Delegated Access for Hadoop Clusters in the Cloud

Architectural challenges for building a low latency, scalable multi-tenant data warehouse

A SURVEY ON SCHEDULING IN HADOOP FOR BIGDATA PROCESSING

Paperspace. Architecture Overview. 20 Jay St. Suite 312 Brooklyn, NY Technical Whitepaper

DCBench: a Data Center Benchmark Suite

White Paper Impact of DoD Cloud Strategy and FedRAMP on CSP, Government Agencies and Integrators.

Module Day Topic. 1 Definition of Cloud Computing and its Basics

Hadoop Virtualization Extensions on VMware vsphere 5 T E C H N I C A L W H I T E P A P E R

Introduction to Cloud Computing. [thoughtsoncloud.com] 1

CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications

Big Data Using Hadoop

Best Practices for Validating the Performance of Data Center Infrastructure. Henry He Ixia

Mesosphere and the Enterprise: Run Your Applications on Apache Mesos. Steve Wong Open Source Engineer {code} by Dell

Handbook of BigDataBench (Version 3.1) A Big Data Benchmark Suite

ΕΠΛ372 Παράλληλη Επεξεργάσια

Introduction to Mesos and the Datacenter Operating System

HiTune. Dataflow-Based Performance Analysis for Big Data Cloud

BigDataBench: a Big Data Benchmark Suite from Web Search Engines

D3N: A multi-layer cache for data centers with imbalanced networks

Title DC Automation: It s a MARVEL!

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

What is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)?

Cloud Computing. What is cloud computing. CS 537 Fall 2017

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015)

Open Security Controller Project Use Cases

Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds. Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng

Achieving Horizontal Scalability. Alain Houf Sales Engineer

SOFTWARE DEFINED STORAGE VS. TRADITIONAL SAN AND NAS

Developing Enterprise Cloud Solutions with Azure

LITERATURE SURVEY (BIG DATA ANALYTICS)!

Baremetal with Apache CloudStack

Pocket: Elastic Ephemeral Storage for Serverless Analytics

Cloud + Big Data Putting it all Together

Evolution of Big Data Facebook. Architecture Summit, Shenzhen, August 2012 Ashish Thusoo

Next-generation IT Platforms Delivering New Value through Accumulation and Utilization of Big Data

WHY COMPOSABLE INFRASTRUCTURE INSTEAD OF HYPERCONVERGENCE

SCALING LIKE TWITTER WITH APACHE MESOS

Cloud Computing and Service-Oriented Architectures

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

OpenNebula on VMware: Cloud Reference Architecture

Building a Data-Friendly Platform for a Data- Driven Future

Introduction to MapReduce (cont.)

Big Data and Cloud Computing

How to scale Windows Azure Application

Apache Spark 2 X Cookbook Cloud Ready Recipes For Analytics And Data Science

QoS-Aware Admission Control in Heterogeneous Datacenters

[This is not an article, chapter, of conference paper!]

LeanBench: comparing software stacks for batch and query processing of IoT data

Improving efficiency of Twitter Infrastructure using Chargeback

High Performance and Cloud Computing (HPCC) for Bioinformatics

Cloud Computing Architecture

POWERING THE INTERNET WITH APACHE MESOS

Oracle NoSQL Database and Cisco- Collaboration that produces results. 1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Oracle Exadata: Strategy and Roadmap

MapR Enterprise Hadoop

NetBackup as a Service

Faculté Polytechnique

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Leveraging the power of Flash to Enable IT as a Service

Analytics in the Cloud Mandate or Option?

Real-time Scheduling of Skewed MapReduce Jobs in Heterogeneous Environments

Dynamic Resource Allocation for Distributed Dataflows. Lauritz Thamsen Technische Universität Berlin

ECE Enterprise Storage Architecture. Fall ~* CLOUD *~. Tyler Bletsch Duke University

REDEFINING THE ENTERPRISE

Storage Solutions for VMware: InfiniBox. White Paper

Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data. Fall 2012

Cloud Deployment Scenarios

Introduction To Cloud Computing

8/24/2017 Week 1-B Instructor: Sangmi Lee Pallickara

BigDataBench: a Benchmark Suite for Big Data Application

Hierarchy of knowledge BIG DATA 9/7/2017. Architecture

The Elasticity and Plasticity in Semi-Containerized Colocating Cloud Workload: a view from Alibaba Trace

The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.

vsan Mixed Workloads First Published On: Last Updated On:

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Scalable Tools - Part I Introduction to Scalable Tools

EXTRACT DATA IN LARGE DATABASE WITH HADOOP

Apache Flink: Distributed Stream Data Processing

Multi-Tenancy Designs for the F5 High-Performance Services Fabric

Top 40 Cloud Computing Interview Questions

DON T CRY OVER SPILLED RECORDS Memory elasticity of data-parallel applications and its application to cluster scheduling

Microsoft Big Data and Hadoop

Enterprise Architectures The Pace Accelerates Camberley Bates Managing Partner & Analyst

Exploring Cloud Security, Operational Visibility & Elastic Datacenters. Kiran Mohandas Consulting Engineer

ELASTIC DATA PLATFORM

Cloud Computing & Visualization

Key aspects of cloud computing. Towards fuller utilization. Two main sources of resource demand. Cluster Scheduling

Forget IOPS: A Proper Way to Characterize & Test Storage Performance Peter Murray SwiftTest

RIDESHARING YOUR CLOUD DATA CENTER Realize Better Resource Utilization with NVMe-oF

Network Requirements for Resource Disaggregation

Modernizing Business Intelligence and Analytics

Transcription:

BigDataBench-MT: Multi-tenancy version of BigDataBench Gang Lu Beijing Academy of Frontier Science and Technology BigDataBench Tutorial, ASPLOS 2016 Atlanta, GA, USA

n Software perspective Multi-tenancy software Multi-tenancy refers to a principle in software architecture where a single instance of the software runs on a server, serving multiple clientorganizations (tenants). With a multi-tenancy architecture, a software application is designed to virtually partition its data and configuration, and each client organization works with a customized virtual application.

Problems of single-tenancy benchmarks n Focus on a single run of workload n Scenarios are not realistic (simple and synthetic)! Does not match the typical operating conditions of real systems, in which mixes of different percentages of tenants and workloads share the same computing infrastructure n We need to Emulate real-world datacenter cluster with different amounts of tenants and various workload types and consequently various benchmarking scenarios.

Multi-tenancy infrastructure Application owners Tenant Tenant Tenant Tenant Internet Infrastructure resources General characteristics: 1. Resource pooling and broad network access 2. On-demand and elastic resource provision 3. Metered resources Cloud infrastructures

Multi-tenancy workloads Application owners Tenant Tenant Tenant Tenant Internet Big Data Workloads Infrastructure resources Cloud infrastructures

BigDataBench-MT: Multi-tenancy version of BigDataBench Mining real-world Workload traces (Google, Facebook, Sogou) Profiling workloads from BigDataBench Workload matching using Machine learning techniques Parametric workload generation tool Benchmarking scenarios Mixed workloads in public clouds Data analytical workloads in private clouds

Example: Hadoop workloads n An example of match Hadoop workloads Mining Facebook workload trace (Exact resource usage information: CPU, memory, I/O) Profiling Hadoop workloads from BigDataBench (Also collect resource usage information) Workload matching using k-means clustering Matching result: replaying basis Job type Input size (GB) Starting Time (minutes) Bayes 2 10 Sort 1 20 K-means 0.5 25 Bayes 5 30 Sort 1 40

Example: mixed workloads n An example of mixing search engine and Hadoop workloads Hadoop jobs with matching replaying basis Nutch searching with matching replaying basis Decide how many tenants to emulate and decide the workload types Parametric workload generation tool Benchmarking scenarios Mixed workloads in public clouds

What can you do with BigDataBench-MT? n We consider two dimensions of the benchmarking scenarios From tenants perspectives From workloads perspectives

You can specify the tenants n The number of tenants Scalability Benchmark: How many tenants are able to run in parallel? n The priorities of tenants Fairness Benchmark: How fair is the system, i.e., are the available resources equally available to all tenants? If tenants have different priorities? n Time line How the number and priorities of tenants change over time?

You can specify the workloads n Data characteristics Data type, source Input/output data volumes, distributions n Computation semantics Algorithms Big data software stacks n Job arrival patterns Arrival rate Arrival sequence

You can specify the interference n Each individual tenant: Different types of workloads How do they interfere each other at different resource dimensionalities? n Multiple tenants: How well are tenants isolated from one another with respect to performance? How do individual tenants influence other tenants' performance?

Current status n Multi-tenancy V1.0 releases: Emulate workloads based on real-world workload traces Support mixes of both online service and offline batch workloads Workloads Software stack Workload trace Nutch Web Apache Tomcat Sogou Search 6.0.26, Search Server (http://www.sogou.com/labs/dl/qe.html) (Nutch) Hadoop Hadoop 1.0.2 Facebook (https://github.com/swimproject UCB/SWIM/wiki) Shark Shark 0.8.0 Google data center (https://code.google.com/p/googl eclusterdata/)