Next-Generation Cloud Platform

Similar documents
NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

CISC 7610 Lecture 2b The beginnings of NoSQL

Introduction to Database Services

Embedded Technosolutions

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

EsgynDB Enterprise 2.0 Platform Reference Architecture

Middle East Technical University. Jeren AKHOUNDI ( ) Ipek Deniz Demirtel ( ) Derya Nur Ulus ( ) CENG553 Database Management Systems

CLOUD COMPUTING It's about the data. Dr. Jim Baty Distinguished Engineer Chief Architect, VP / CTO Global Sales & Services, Sun Microsystems

Scalability of web applications

Windows Servers In Microsoft Azure

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

Colin Cunningham, Intel Kumaran Siva, Intel Sandeep Mahajan, Oracle 03-Oct :45 p.m. - 5:30 p.m. Moscone West - Room 3020

Scalable Tools - Part I Introduction to Scalable Tools

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

OPENSTACK: THE OPEN CLOUD

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic

Deployment Planning and Optimization for Big Data & Cloud Storage Systems

Chapter 5. The MapReduce Programming Model and Implementation

VMware Virtual SAN Technology

Advanced Database Systems

Data Centers and Cloud Computing

Architecture of a Real-Time Operational DBMS

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Challenges for Data Driven Systems

Oracle IaaS, a modern felhő infrastruktúra

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Efficient On-Demand Operations in Distributed Infrastructures

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

Apache Hadoop Goes Realtime at Facebook. Himanshu Sharma

Distributed File Systems II

CIT 668: System Architecture. Amazon Web Services

Safe Harbor Statement

HPE SimpliVity. The new powerhouse in hyperconvergence. Boštjan Dolinar HPE. Maribor Lancom

Developing Enterprise Cloud Solutions with Azure

DIVING IN: INSIDE THE DATA CENTER

2013 AWS Worldwide Public Sector Summit Washington, D.C.

Advanced Database Technologies NoSQL: Not only SQL

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

What is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)?

Next Generation Storage for The Software-Defned World

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

Cloud Providers more AWS, Aneka

DATABASE SCALE WITHOUT LIMITS ON AWS

Data Centers and Cloud Computing. Slides courtesy of Tim Wood

Basics of Cloud Computing Lecture 2. Cloud Providers. Satish Srirama

Data Centers and Cloud Computing. Data Centers

Leveraging the power of Flash to Enable IT as a Service

Big Data in OpenStack Storage

Nested Virtualization and Server Consolidation

Distributed Systems. 31. The Cloud: Infrastructure as a Service Paul Krzyzanowski. Rutgers University. Fall 2013

Service Oriented Performance Analysis

Scaling DreamFactory

CS 655 Advanced Topics in Distributed Systems

Cloud Computing 4/17/2016. Outline. Cloud Computing. Centralized versus Distributed Computing Some people argue that Cloud Computing. Cloud Computing.

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

Best Practices for Validating the Performance of Data Center Infrastructure. Henry He Ixia

When, Where & Why to Use NoSQL?

Introduction to Distributed Data Systems

Presented by Sunnie S Chung CIS 612

DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing WHAT IS CLOUD COMPUTING? 2. Slide 3. Slide 1. Why is it called Cloud?

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기

Cloud Storage with AWS: EFS vs EBS vs S3 AHMAD KARAWASH

Cloud Analytics and Business Intelligence on AWS

Hyper-Convergence De-mystified. Francis O Haire Group Technology Director

Faculté Polytechnique

SQL Server 2014 Upgrade

Top 40 Cloud Computing Interview Questions

Programming model and implementation for processing and. Programs can be automatically parallelized and executed on a large cluster of machines

MySQL In the Cloud. Migration, Best Practices, High Availability, Scaling. Peter Zaitsev CEO Los Angeles MySQL Meetup June 12 th, 2017.

Welcome to the New Era of Cloud Computing

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

Course Overview. ECE 1779 Introduction to Cloud Computing. Marking. Class Mechanics. Eyal de Lara

Altaro VM Backup vs. Unitrends

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?

Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies

Microsoft Windows Embedded Server Overview

Developing Microsoft Azure Solutions (70-532) Syllabus

Rule 14 Use Databases Appropriately

HiTune. Dataflow-Based Performance Analysis for Big Data Cloud

Cloud Performance Simulations

Basics of Cloud Computing Lecture 2. Cloud Providers. Satish Srirama

CSE-E5430 Scalable Cloud Computing Lecture 9

Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions

@joerg_schad Nightmares of a Container Orchestration System

SCALABLE CONSISTENCY AND TRANSACTION MODELS

The Intersection of Cloud & Solid State Storage

Introducing SUSE Enterprise Storage 5

Enosis: Bridging the Semantic Gap between

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

5 Fundamental Strategies for Building a Data-centered Data Center

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure

High Availability Without the Cluster (or the SAN) Josh Sekel IT Manager, Faculty of Business Brock University

RAMCloud. Scalable High-Performance Storage Entirely in DRAM. by John Ousterhout et al. Stanford University. presented by Slavik Derevyanko

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Transcription:

Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology (POSTECH)

Cloud companies run datacenters Cloud IT company = datacenter company

Large companies process big data 2

Smart companies make smart devices Smart company = smart device company

Smart devices + Cloud + Big Data How to make money? (or MORE money) 4

Outline Introduction Little more history Issues in cost-effective cloud Cost-effective cloud @ POSTECH Summary 5

The birth of cloud computing How did it start? Major IT companies (e.g., Google, Amazon, MS, etc.) We have too many computers, but mostly idle computers How can we make more money? Rest of the world (e.g., anyone using computers) We don t want to maintain expensive computers. How can we reduce our costs? Let s sell/buy computing as an on-line service! 6

The birth of big data How did it start? BIG, BIG, BIG data Existing things are not working any more Scalability o Can handle increasing amount of data? Fault tolerance o Can work with failed storage? Read & Write o Can access the data as we used to do? Processing data o Can work on the large data on many nodes? 7

The birth of smart devices We carry small, but powerful computers Mostly standardized components ARM CPU, Moderate GPU, SSD,.. Google Android, Apple ios, Microsoft Windows 8,.. HD camera, scripting tool,.. App market (well, mostly games?),.. How to become a market leader? (or how to differentiate your devices?) Well, let s pack all these things fast and nicely! 8

Outline Introduction Little more history Issues in cost-effective cloud Performance Power RAS Analysis Storage Mobile Cost-effective cloud @ POSTECH Summary 9

Quality of Performance (1/2) No true multi thing! Virtual machine is only virtual > Resource contention exists ALUs, caches I/O devices Net bandwidth > Difficult performance analysis [Example 4-core CPU] 10

Execution Time Quality of Performance (2/2) Two programs on 2 VMs / 2 Cores One program on 1 VM / 1 Core Service you wanted!= Service you obtained 1KB 16KB 256KB 4MB 16MB QoS failure

Outline Introduction Little more history Issues in cost-effective cloud Performance Power RAS Analysis Storage Mobile Cost-effective cloud @ POSTECH Summary 12

Bad news for power: no work [Example Google server] Datacenter is usually idle Still consumes huge power We need 100% busy servers & 100% power-off servers for some periods 13

Outline Introduction Little more history Issues in cost-effective cloud Performance Power RAS Analysis Storage Mobile Cost-effective cloud @ POSTECH Summary 14

Systems do fail in field! Data Loss Device failure Computing error System failure RAS is one of key SLA for large-scale clusters 15

Datacenter can fail anytime, anywhere! Failure rate of datacenter (case of soft error) Mean Time Between Failures (MTBF) Average times between two failures per system. For a single system, After applying all the existing reliability techniques, we can achieve MTBF = 30 years = 10,000 days. For a data center, however, If we have 10K servers, our MTBF = 1 day. At least one system in the center fails everyday. More redundancy? Too expensive for a datacenter. 16

Outline Introduction Little more history Issues in cost-effective cloud Performance Power RAS Analysis Storage Mobile Cost-effective cloud @ POSTECH Summary 17

VM-level analysis is TOO LIMITED Difficult to analyze the performance of apps on VM

System-level analysis is TOO DIFFICULT [150GB sort using 640-thread CPUs using Hadoop @ Sun Microsystems] Various performance bottlenecks exist.

Timing simulation is TOO SLOW Typical simulation speed Speed granularity as Instruction Per Second (IPS) Real machine (e.g., Intel CPU) : 1,000,000,000 IPS Same ISA/arch functional VM simulator : 500,000,000 IPS Different ISA/arch functional VM simulator : 1,000,000 IPS Timing simulator (e.g., Flexus) : 1,000 IPS 1 min on real machine >1 year on timing simulator However, real-world workloads require long-period benchmarking (e.g., hours for TPCC on database engine) How to do cloud-level performance simulation? 20

Outline Introduction Little more history Issues in cost-effective cloud Performance Power RAS Analysis Storage Mobile Cost-effective cloud @ POSTECH Summary 21

BIG, BIG, BIG data Existing things are not working any more Scalability Can handle increasing amount of data? Fault tolerance Can work with failed storage? Read & Write Can access the data as we used to do? Processing data Can work on the large data on many nodes? How will cloud computing help big data? 22

Big Data: storage (i.e., file system) Performance vs Replication Big data stored in many disk drives Adding or removing disk drives Horizontal elasticity Separating control node and data objects (often unstructured) o meta node : know the physical locations of data o object node : store the data object Must maintain replications Slow performance Solutions with different tradeoffs (e.g., how to manage meta node and object node, global view,..) File system: Google FS (GFS), Hadoop FS (HDFS), Amazon Dynamo Cloud storage: Amazon S3, OpenStack Swift 23

Big Data: access (i.e., database) SQL vs NoSQL Conventional RDBMS systems cannot be scaled without sacrificing its lock/log-based ACID (Atomicity, Consistency, Isolation and Durability) Solutions supporting only parts of ACID (e.g., key-value access/column oriented /unstructured data) Google Bigtable: atomicity on single keys Yahoo PNUTS: serialized single-key writes timeline consistency Amazon SimpleDB: asynchronous writes eventual consistency 24

Example big data stack Google specific View Open-Source Implementations from Apache/Yahoo New file system Google File System (GFS) o Object-based distributed file system New database Google Big Table o Column-based, key-value based data access New data processing Google Map & Reduce o Distribute/collect jobs based on key-value grouping Amazon, MS, OpenStack, all have different views 25

Outline Introduction Little more history Issues in cost-effective cloud Performance Power RAS Analysis Storage Mobile Cost-effective cloud @ POSTECH Summary 26

Why mobile cloud computing? High performance Let s borrow the server-scale power E.g., 3D HD game using GPU @ datacenter Let s take advantage of cloud-scale information E.g., Amazon Silk cloud browser s predictive web caching Longer battery life Let s offloading mobile work to datacenter E.g., Intel CloneCloud, Microsoft MAUI Early time to market Let the cloud handle development-tricky things 27

Mobile cloud: task offloading What to offload? - Functions - Threads - processes - Tasks When to offload? - High performance - Long battery life These are ongoing research issues.. BTW, is the cloud free? 28

Outline Introduction Little more history Issues in cost-effective cloud Cost-effective cloud @ POSTECH Compute Cloud Storage Cloud Mobile Cloud Cloud Workload Summary 29

PosCloud: Advanced open-source based cloud/big data system @ POSTECH 30

Commercial cloud solutions are VERY expensive Can t use immature open-source solutions Lack of key features e.g., monitoring, migration, RAS, backup,... Can t afford commercial solutions costs up to 1000s of dollars per CPU + licensing fees (for advanced management features) Price for 1,000~10,000 nodes? How to modify commercial engines? 31

PosCloud: open-source implementation Postech Cloud Engine - monitoring - analysis - deployment - virtualization - live migration - power saving POSTECH Datacenter (100+ nodes) Tools OpenStack Condor Hadoop Xen Function Infrastructure as a Service (IaaS) Workload scheduling Scalable file system Virtualization 32

PosCloud: a big picture 33

PosCloud: dynamic resource management Quality of Service Reliability, Availability, Serviceability Slow Fail OFF OFF Cloud Engine OFF Scalable Node M anagement 34

PosCloud: system-wide monitoring 35

PosCloud: cost-effectiveness Service Quality Typical Open-source Typical Commercial PosCloud Cloud Computing Management IaaS Service Performance - + Power - + Recovery - + Availability - + Other Features -? Open-source Platform - - S/W costs ~$0 1000s of $ per CPU ~$0 Commercial-level services at near zero prices! 36

Outline Introduction Little more history Issues in cost-effective cloud Cost-effective cloud @ POSTECH Compute Cloud Storage Cloud Mobile Cloud Cloud Workload Summary 37

The importance of scalable storage Representative enterprise cloud storages Google File System Amazon S3 (Dynamo) Facebook Haystack Yahoo Walnut / HDFS And many more 38

Evaluating cloud storage Do existing solutions work well? Is it really fast? Is it really scalable? Is it really reliable? What we are focusing on Identifying the performance bottleneck of a system Using architectural support to improve existing storage system Many research challenges for system architects 39

Our storage cloud system OpenStack Swift Popular open-source object storage system for cloud environments Similar to enterprise storage system s architecture (Amazon s Dynamo) Scalable and Reliable State-of-the-art cluster Intel SandyBridge Xeon processors 500TB storage capacity (SSD/HDD) 30+ storage servers 40

Outline Introduction Little more history Issues in cost-effective cloud Cost-effective cloud @ POSTECH Compute Cloud Storage Cloud Mobile Cloud Cloud Workload Summary 41

Mobile cloud: task offloading Multi-level offloading - Device Server Cloud.. Load balancer Dynamic load balancing - Device Desktop - Desktop Server - Server Cloud Dynamic offloading & balancing required! 42

PosCloud: mobile offloading Low Power High Performance Low Costs PosCloud Smart Devices 43

Outline Introduction Little more history Issues in cost-effective cloud Cost-effective cloud @ POSTECH Compute Cloud Storage Cloud Mobile Cloud Cloud Workload Summary 44

Making realistic cloud workloads What a new workload should have Support for new infrastructure Cloud services are based on distributed, scalable system Realistic dataset Cloud applications handle massive dataset,.. No SQL, Map-Reduce, Web server, Mail server, Multimedia,.. Evaluating virtualization performance Cloud vendors use virtualization techniques to better utilize hardware resources 45

Two workloads on PosCloud CloudSuite [from EPFL] Benchmark suite consists of scale-out applications Covers broad range of applications: 6 different categories SPECvirt [from www.spec.org] Performance evaluation of a single datacenter server Covers all system components: hardware, virtualization platform, virtualized guest OS and application software 46

PosCloud: Real-world workloads CloudSuite Data Serving Serving data queries in a scalable nosql storage system MapReduce Scalable machine learning library on Hadoop Media Streaming RTP/RTSP streaming server Software Testing Automated real-world software testing Web Serving/Search Search-oriented dynamic Web server 47

PosCloud: Real-world workloads SpecVirt OS : Centos 5.6 Virtualization : KVM-83 Webserver : Apache 2 with PHP 5 Infraserver : Apache 2 with fast-cgi Appserver : Oracle Glassvish v2 Mailserver : Dovecot 1.2.17 DBserver : PostgreSQL 8 48

SPECvirt : Throughput + QoS test Metric: How many VMs can be run, while maintaining target QoS? 49

Research projects under PosCloud Mobile offloading Representative real-world mobile applications Potential workload reduction Datacenter performance monitoring Performance counter virtualization Resource contention identification Fast, live migration of virtual machines Quality-of-Service guarantee Load balancing for power re-cycling Big data management Scalable object-oriented storage engine SSD-HDD hybrid storage 50

Summary Mobile cloud computing = Big Future (but, must be cost-effective!) What we do @ POSTECH PosCloud Mobile workload offloading Quality-of-Service guarantee Performance monitoring VM, process, function migration Cloud/Big Data workloads 51

Question? Thank You! Jangwoo Kim e-mail: jangwoo@postech.ac.kr http://www.postech.ac.kr/~jangwoo 52