Building A Better Test Platform:

Similar documents
Cloud & container monitoring , Lars Michelsen Check_MK Conference #4

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Big Data Hadoop Stack

Innovatus Technologies

CS-580K/480K Advanced Topics in Cloud Computing. Container III

@joerg_schad Nightmares of a Container Orchestration System

Hadoop & Big Data Analytics Complete Practical & Real-time Training

Who is Docker and how he can help us? Heino Talvik

Think Small to Scale Big

Certified Big Data Hadoop and Spark Scala Course Curriculum

Deployment Patterns using Docker and Chef

Important DevOps Technologies (3+2+3days) for Deployment

Introduction to BigData, Hadoop:-

[Docker] Containerization

Red Hat Atomic Details Dockah, Dockah, Dockah! Containerization as a shift of paradigm for the GNU/Linux OS

HPC Cloud Burst Using Docker

Top 25 Hadoop Admin Interview Questions and Answers

An introduction to Docker

IBM Bluemix compute capabilities IBM Corporation

Getting Started with Hadoop

Big Data Hadoop Course Content

Investigating Containers for Future Services and User Application Support

Harbor Registry. VMware VMware Inc. All rights reserved.

Security and Performance advances with Oracle Big Data SQL

Introduction to containers

Run containerized applications from pre-existing images stored in a centralized registry

Big Data Analytics using Apache Hadoop and Spark with Scala

Windows Azure Services - At Different Levels

CONTINUOUS DELIVERY WITH MESOS, DC/OS AND JENKINS

The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.

CONTINUOUS DELIVERY WITH DC/OS AND JENKINS

Docker und IBM Digital Experience in Docker Container

/ Cloud Computing. Recitation 5 February 14th, 2017

Hadoop. Introduction / Overview

Table of Contents 1.1. Introduction. Overview of vsphere Integrated Containers 1.2

Cloud I - Introduction

Chapter 5. The MapReduce Programming Model and Implementation

Configuring and Deploying Hadoop Cluster Deployment Templates

DevOps in the Cloud A pipeline to heaven?! Robert Cowham BCS CMSG Vice Chair

Basic Concepts of the Energy Lab 2.0 Co-Simulation Platform

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

Distributed Systems COMP 212. Lecture 18 Othon Michail

Amir Zipory Senior Solutions Architect, Redhat Israel, Greece & Cyprus

VMware vsphere Big Data Extensions Administrator's and User's Guide

Test Automation with Jenkins & TidalScale WaveRunner

InterSystems Cloud Manager & Containers for InterSystems Technologies. Luca Ravazzolo Product Manager

ZeroStack Quick Start Guide

Infrastructure at your Service. Oracle over Docker. Oracle over Docker

Splunk N Box. Splunk Multi-Site Clusters In 20 Minutes or Less! Mohamad Hassan Sales Engineer. 9/25/2017 Washington, DC

Oracle Big Data Fundamentals Ed 2

Docker and Security. September 28, 2017 VASCAN Michael Irwin

Expert Lecture plan proposal Hadoop& itsapplication

Certified Big Data and Hadoop Course Curriculum

Introduction to Containers

Travis Cardwell Technical Meeting

OS Virtualization. Linux Containers (LXC)

Advanced Continuous Delivery Strategies for Containerized Applications Using DC/OS

Table of Contents 1.1. Overview. Containers, Docker, Registries vsphere Integrated Containers Engine

10 Million Smart Meter Data with Apache HBase

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver

What s New in Apache HTrace by Colin P. McCabe

TEN LAYERS OF CONTAINER SECURITY

Code: Slides:

A Hands on Introduction to Docker

Report on The Infrastructure for Implementing the Mobile Technologies for Data Collection in Egypt

Top 40 Cloud Computing Interview Questions

Simplified CICD with Jenkins and Git on the ZeroStack Platform

What is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)?

Rails on HBase. Zachary Pinter and Tony Hillerson RailsConf 2011

Introduction to Docker. Antonis Kalipetis Docker Athens Meetup

Beyond 1001 Dedicated Data Service Instances

1Z Oracle Big Data 2017 Implementation Essentials Exam Summary Syllabus Questions

Microservices. Chaos Kontrolle mit Kubernetes. Robert Kubis - Developer Advocate,

Docker A FRAMEWORK FOR DATA INTENSIVE COMPUTING

The four forces of Cloud Native

Containerizing GPU Applications with Docker for Scaling to the Cloud

/ Cloud Computing. Recitation 5 September 27 th, 2016

iway iway Big Data Integrator New Features Bulletin and Release Notes Version DN

Hadoop: The Definitive Guide

Cloudera Manager Quick Start Guide

Title DC Automation: It s a MARVEL!

Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies

Oracle Big Data Fundamentals Ed 1

HBase: Overview. HBase is a distributed column-oriented data store built on top of HDFS

/ Cloud Computing. Recitation 5 September 26 th, 2017

70-247: Configuring and Deploying a Private Cloud with System Center 2012

P a g e 1. Teknologisk Institut. Online kursus k SysAdmin & DevOps Collection

Running MarkLogic in Containers (Both Docker and Kubernetes)

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

CONTAINERS AND MICROSERVICES WITH CONTRAIL

Characterising Resource Management Performance in Kubernetes. Appendices.

Continuous integration & continuous delivery. COSC345 Software Engineering

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Introduction to Cloudbreak

Dockerized Tizen Platform

Dr. Roland Huß, Red

iway Big Data Integrator New Features Bulletin and Release Notes

Hadoop An Overview. - Socrates CCDH

At Course Completion Prepares you as per certification requirements for AWS Developer Associate.

Welcome to the New Era of Cloud Computing

Transcription:

Building A Better Test Platform: A Case Study of Improving Apache HBase Testing with Docker Aleks Shulman, Dima Spivak

Outline About Cloudera Apache HBase Overview API compatibility API compatibility testing framework Implementation and challenges Intro. to containers and Docker API compatibility testing framework, revisited Conclusions 2

About Cloudera Distributes CDH, Cloudera's 100% open-source distribution including Apache Hadoop. Distributes Cloudera Manager (CM), a proprietary monitoring and management layer atop CDH. Contributes heavily to the open source community. Employs 50 committers, holding 84 committerships. 18 Apache projects started by Clouderans, including Flume, Sqoop, Sentry, etc. 3

What is Apache HBase? Apache HBase is the Hadoop database, a distributed, scalable, big data store. Built atop HDFS. Low-latency, consistent, randomaccess. Modeled after Google s BigTable paper. Non-relational. NoSQL. 4

SQL vs. HBase SQL CREATE ALTER DROP HBase API create modify* disable, drop HBase API Java Full-featured. INSERT put REST Easy to use. SELECT get, scan UPDATE DELETE put delete Thrift Multi-language support. 5

CDH and Apache HBase CDH3.x CDH4.x CDH5.x 0 6 0 1 2 5 6 7 0 1 2 HBase 0.90.x HBase 0.92.x HBase 0.94.x HBase 0.96.x HBase 0.98.x 0 1 2 0 1 2 0 0 6

API Compatibility RPC compatibility: Incompatibilities around serialization/deserialization. 17:22:15 Exception in thread "main" java.lang.illegalargumentexception: Not a host:port pair: PBUF 17:22:15 * 17:22:15 api-compat-8.ent.cloudera.com( 17:22:15 at org.apache.hadoop.hbase.util.addressing.parsehostname(addressing.java:60) 17:22:15 at org.apache.hadoop.hbase.servername.<init>(servername.java:101) 17:22:15 at org.apache.hadoop.hbase.servername.parseversionedservername(servername.java:283) 17:22:15 at org.apache.hadoop.hbase.masteraddresstracker.bytestoservername(masteraddresstracker.java:77) 17:22:15 at org.apache.hadoop.hbase.masteraddresstracker.getmasteraddress(masteraddresstracker.java:61) 17:22:15 at org.apache.hadoop.hbase.client.hconnectionmanager$hconnectionimplementation.getmaster(hconnectionmanager.java:703) 17:22:15 at org.apache.hadoop.hbase.client.hbaseadmin.<init>(hbaseadmin.java:126) 17:22:15 at Client_4_3_0.setup(Client_4_3_0.java:716) 17:22:15 at Client_4_3_0.main(Client_4_3_0.java:63) 7

API Compatibility Binary compatibility: Client dependencies changing in an incompatible way can result in a runtime exception. Examples: HTable constructors removed in CDH4.2. public HTable(final String tablename)! public HTable(final byte[] tablename)! HBASE-8273: Change of HColumnDescriptor setter return types. Previously returned void, builder pattern changed this. 8

API Compatibility HBase application HBase client HBase server v1.0 0.92.x 0.92.x CDH4 v1.1 0.94.x 0.94.x Partial rewrite likely necessary CDH5 0.96.x 0.96.x Binary compatibility RPC compatibility 9

Testing 1. Build cluster with version X. 2. Run API client with version Y. X Y 10

API testing implementation Provisioning Provision host on private cloud. Static VMs. Tarball framework Compile and package CDH server bits (version X). Start up the services as a single-node pseudo-distributed cluster (version X). Compile and run CDH client (version Y). Problems Failure modes Infrastructure (VMs, hosts). Artifacts. Additional complexity Special configurations (e.g. Kerberos). Clean-up jobs for hosts. Adding/maintaining server and client versions. No buffer between provisioning, packaging, deployment, and running tests. 11

Implementation alternatives Can we reuse existing tools? CDH packaging Cloudera Manager* Can we maximize utilization of computing resources? 12

Linux Containers Isolated environments on a Linux host cgroups (2.6.24+) Isolating resources (memory, CPU, networking) Namespace isolation (filesystems, process trees) 13

Linux Containers User processes User processes User processes User processes Libraries Libraries Libraries Libraries Guest OS Guest OS Guest OS Guest OS User processes User processes Libraries User processes User processes Hypervisor Host Operating System Host Operating System Virtual Machines Containers 14

Linux Containers Advantages Low overhead Fast startup and shutdown Disadvantages Only host kernel-compatible operating systems Less isolation* 15

Docker User front-end for containers Container management (start, stop, pause) Images (templates for containers) Registries (repository for images) 16

Docker docker run! Start container from image docker build! Create image from Dockerfile docker commit! Create image from container 17

Use cases Single-process applications root@savard:~# docker run ubuntu:12.04 echo "Hello world"! Hello world! root@savard:~# _!! Persistent containers Containers as daemons. init process root@savard:~# docker run -d ubuntu:12.04 /sbin/init! 18

API testing implementation, revisited Problems Failure modes Infrastructure (VMs, hosts). Artifacts. Additional complexity Special configurations (e.g. Kerberos). Clean-up jobs for hosts. Adding/maintaining server and client versions. Solutions Portability Images, internal registry. Artifacts built in. Reduce complexity Package configurations in images. Containers initialize to same state. No need to maintain all-in-one image creation logic. Separated packaging, cluster deployment, and running tests; environment becomes a test parameter. 19

API testing implementation, revisited HBase server Persistent containers. Run CM/CDH in a set of containers. 1 container per CM host. 1 container for DNS server. Create any size cluster from 2 images. Provision physical host once, switch cluster versions easily. API client Single-process application. 20

Building cluster images Build cluster Start container cluster DNS server dnsserver (10.0.0.2) Node node-1 (10.0.0.3) Node node-2 (10.0.0.4) 21

Building cluster images DNS server dnsserver (10.0.0.2) Master Node node-1 (10.0.0.3) Slave Node node-2 (10.0.0.4) Deploy CM/CDH Validate cluster health Stop services docker commit! 22

Starting cluster containers Start cluster DNS server dnsserver (10.0.0.2) Node node-1 (10.0.0.3) Master Node node-2 (10.0.0.4) Slave Node node-3 (10.0.0.5) Slave Node node-4 (10.0.0.6) Slave 23

Starting cluster containers DNS server dnsserver (10.0.0.2) Node node-1 (10.0.0.3) Node node-2 (10.0.0.4) Node node-3 (10.0.0.5) Node node-4 (10.0.0.6) Configure cluster Start services Validate cluster health 24

Running API client DNS server dnsserver (10.0.0.2) Node node-1 (10.0.0.3) Node node-2 (10.0.0.4) Node node-3 (10.0.0.5) Node node-4 (10.0.0.6) docker run ubuntu12.04_cdh5.0.0_hbase_client! Script executing org.apache.hadoop.hbase.client.testfromclientside on distributed cluster. Pass in environmental variables: -e HBASE_ZK_QUORUM=node-1.internal! -e MODE=<rpc binary>! 25

Running API client DNS server dnsserver (10.0.0.2) Node node-1 (10.0.0.3) Node node-2 (10.0.0.4) Node node-3 (10.0.0.5) Node node-4 (10.0.0.6) docker run ubuntu12.04_cdh5.0.0_hbase_client! Archive output Reuse/destroy cluster 26

Conclusions Recognized automation that was more trouble than it was worth. More effort into setting up test environment than on running the tests. Overhauled test framework using Docker. Maximize utilization of computing resources. Made environment a test parameter. Further work: Consider workflows beyond compatibility testing. Upgrade testing. Failure injection. 27