GoDocker. A batch scheduling system with Docker containers

Similar documents
Think Small to Scale Big

Using DC/OS for Continuous Delivery

Federated Prometheus Monitoring at Scale

Package your Java Application using Docker and Kubernetes. Arun

70-532: Developing Microsoft Azure Solutions

70-532: Developing Microsoft Azure Solutions

Scale your Docker containers with Mesos

Part2: Let s pick one cloud IaaS middleware: OpenStack. Sergio Maffioletti

Kubernetes introduction. Container orchestration

A curated Domain centric shared Docker registry linked to the Galaxy toolshed

Architecting for Failure in a Containerized World. Tom Faulhaber Infolace

Zero to Microservices in 5 minutes using Docker Containers. Mathew Lodge Weaveworks

An Introduction to Kubernetes

Kubernetes: Twelve KeyFeatures

Important DevOps Technologies (3+2+3days) for Deployment

Fixing the "It works on my machine!" Problem with Docker

Mesosphere and Percona Server for MongoDB. Jeff Sandstrom, Product Manager (Percona) Ravi Yadav, Tech. Partnerships Lead (Mesosphere)

AGILE DEVELOPMENT AND PAAS USING THE MESOSPHERE DCOS

CONTINUOUS DELIVERY WITH MESOS, DC/OS AND JENKINS

Using AWS to Build a Large Scale Dockerized Microservices Architecture. Dr. Oliver Wahlen moovel Group GmbH Frankfurt, 30.

OpenStack Magnum Pike and the CERN cloud. Spyros

Issues Fixed in DC/OS

Mesosphere and Percona Server for MongoDB. Peter Schwaller, Senior Director Server Eng. (Percona) Taco Scargo, Senior Solution Engineer (Mesosphere)

A REFERENCE ARCHITECTURE FOR DEPLOYING WSO2 MIDDLEWARE ON KUBERNETES

Operating Within Normal Parameters: Monitoring Kubernetes

What s New in Red Hat OpenShift Container Platform 3.4. Torben Jäger Red Hat Solution Architect

Performance Monitoring and Management of Microservices on Docker Ecosystem

Launching StarlingX. The Journey to Drive Compute to the Edge Pilot Project Supported by the OpenStack

Armon HASHICORP

DevOps Technologies. for Deployment

Container Orchestration on Amazon Web Services. Arun

PCP: Ingest and Export

Triangle Kubernetes Meet Up #3 (June 9, 2016) From Beginner to Expert

The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.

Lightweight Containerization at Facebook

Scheduling Applications at Scale

Building/Running Distributed Systems with Apache Mesos

Monitoring system for geographically distributed datacenters based on Openstack. Gioacchino Vino

Understanding and Evaluating Kubernetes. Haseeb Tariq Anubhavnidhi Archie Abhashkumar

Kubernetes. Introduction

Advanced Continuous Delivery Strategies for Containerized Applications Using DC/OS

StreamSets Control Hub Installation Guide

MESOS A State-Of-The-Art Container Orchestrator Mesosphere, Inc. All Rights Reserved. 1

Wrapp. Powered by AWS EC2 Container Service. Jude D Souza Solutions Wrapp Phone:

Building an on premise Kubernetes cluster DANNY TURNER

CONTINUOUS DELIVERY WITH DC/OS AND JENKINS

Redesigning Apache Flink s Distributed Architecture. Till

EASILY DEPLOY AND SCALE KUBERNETES WITH RANCHER

TEN LAYERS OF CONTAINER SECURITY

Apache Storm: Hands-on Session A.A. 2016/17

Developing Microsoft Azure Solutions (70-532) Syllabus

The Art of Container Monitoring. Derek Chen

Building Kubernetes cloud: real world deployment examples, challenges and approaches. Alena Prokharchyk, Rancher Labs

Kuberiter White Paper. Kubernetes. Cloud Provider Comparison Chart. Lawrence Manickam Kuberiter Inc

Nuxeo Server 9.1 Release Notes

Kubernetes: What s New


Developing and Testing Java Microservices on Docker. Todd Fasullo Dir. Engineering

OpenShift 3 Technical Architecture. Clayton Coleman, Dan McPherson Lead Engineers

TEN LAYERS OF CONTAINER SECURITY

Red Hat OpenShift Roadmap Q4 CY16 and H1 CY17 Releases. Lutz Lange Solution

Hynek Schlawack. Get Instrumented. How Prometheus Can Unify Your Metrics

CloudBATCH: A Batch Job Queuing System on Clouds with Hadoop and HBase. Chen Zhang Hans De Sterck University of Waterloo

Developing Microsoft Azure Solutions (70-532) Syllabus

FROM MONOLITH TO DOCKER DISTRIBUTED APPLICATIONS

On-demand Authentication Infrastructure for Test and Development Andrew Leonard Dell EMC/Isilon

ENHANCE APPLICATION SCALABILITY AND AVAILABILITY WITH NGINX PLUS AND THE DIAMANTI BARE-METAL KUBERNETES PLATFORM

Developing Microsoft Azure Solutions (70-532) Syllabus

Introduction to Mesos and the Datacenter Operating System

SUG Breakout Session: OSC OnDemand App Development

Exam : Implementing Microsoft Azure Infrastructure Solutions

Port Usage Information for the IM and Presence Service

Cloud I - Introduction

Building a Data-Friendly Platform for a Data- Driven Future

Lessons Learned: Deploying Microservices Software Product in Customer Environments Mark Galpin, Solution Architect, JFrog, Inc.

Cloud Computing Introduction to Cloud Foundry

Open-Falcon A Distributed and High-Performance Monitoring System. Yao-Wei Ou & Lai Wei 2017/05/22

Document Sub Title. Yotpo. Technical Overview 07/18/ Yotpo

Spark and Flink running scalable in Kubernetes Frank Conrad

Continuous Integration and Deployment (CI/CD)

Upcoming Services in OpenStack Rohit Agarwalla, Technical DEVNET-1102

Linux Administration

We are ready to serve Latest IT Trends, Are you ready to learn? New Batches Info

Reliability, load-balancing, monitoring and all that: deployment aspects of UNICORE. Bernd Schuller UNICORE Summit 2016 June 23, 2016

Clarens Web Services Architecture

Container Networking and Openstack. Fernando Sanchez Fawad Khaliq March, 2016

Getting Started With Amazon EC2 Container Service

CHALLENGES IN A MICROSERVICES AGE: MONITORING, LOGGING AND TRACING ON OPENSHIFT. Martin Etmajer Technology May 4, 2017

Kubernetes objects on Microsoft Azure

FreeIPA Cross Forest Trusts

Scaling Jenkins with Docker and Kubernetes Carlos

VMware Integrated OpenStack with Kubernetes Getting Started Guide. VMware Integrated OpenStack 4.1

Seagull: A distributed, fault tolerant, concurrent task runner. Sagar Patwardhan

/ Cloud Computing. Recitation 5 February 14th, 2017

Project Kuryr. Antoni Segura Puimedon (apuimedo) Gal Sagie (gsagie)

VMware Integrated OpenStack with Kubernetes Getting Started Guide. VMware Integrated OpenStack 4.0

FogIoT Orchestrator: an Orchestration System for IoT Applications in Fog Environment

Azure Learning Circles

AWS 101. Patrick Pierson, IonChannel

Installation and setup guide of 1.1 demonstrator

Transcription:

GoDocker A batch scheduling system with Docker containers Web - http://www.genouest.org/godocker/ Code - https://bitbucket.org/osallou/go-docker Twitter - #godocker Olivier Sallou IRISA - 2016 CC-BY-SA

What Execute batch jobs/commands in containers For multi-user system (ldap based for example) With personal and/or common shared directories (home, central data, ) In a scalable architecture to handle massive job submission.

Why? Need for an open source scheduling job submission tool (like Sun Grid Engine) with isolation of resources availability of tools without cluster specific OS/version issues (with containers) with remote and authenticated access with access to job resource monitoring

How? Using proven technologies and software Using scalable components With plugin support to modify easily default behavior and adapt it to YOUR system.

Technologies Docker: for containers Docker Swarm, Apache Mesos, Kubernetes for job execution and dispatch, as well as for node monitoring. Google cadvisor: for job monitoring Language: Python Databases backend: MongoDB, Redis, InfluxDB (optional). Monitoring: Prometheus

Features Remote execution of a job (command line) in a Docker container with requested resources (cpu, memory) with requested directories mounted in container (according to ACL) Allowed container images can be limited to a list (config) User can specify the container image (config) Optional root access to container (config)

Features Interactive sessions (ssh) in a container User/Group priority and quotas. Jobs scheduling according to multiple properties (priority, waiting time, previous usage, ). Fair share algorithm available. Plugins to modify or add features. Global and per job monitoring (past and live). Partial DRMAA v1 support

Architecture CLI/ Web UI Scheduler Watchers Web proxy Submit task Monitor tasks MongoDB Dispatcher (Swarm, Mesos) Web servers Redis Execution nodes (with Mesos/Docker) Influxdb Shared file system

Databases Mongodb: used to store jobs data Redis: use lists to dispatch requests between executors to monitor jobs Influxdb: optional db to store time based data (cpu/memory usage, number of jobs, etc.)

Components CLI : Command Line Interface Web interface / REST API Authentication / ACLs => plugins Scheduler => plugins Executor => plugins Watchers => plugins

Command Line Interface Execute commands using the REST API of the web server: submit and control running jobs download output files from jobs etc. Some commands are dedicated to administrators: project and user quota manager etc.

Web server Submit and manage tasks via web UI REST interface for remote control Partial DRMAA v1 integration Register new tasks for scheduler.

Authentication / ACL A plugin is available to authenticate users with an LDAP but it may be adapted to your needs manage authentication for web site define which volumes/directories can be mounted in container (user home directory etc.), and their mode (ro, rw). Other plugins can be developed for specific authentication/acl

Scheduler Only one instance of the process The scheduler reorder job requests: per priority (user and/or group) reject if quota reached different algorithms are available: fifo fair(share) others can be added with plugins

Scheduler It executes the job command using the executor plugin: Docker Swarm Apache Mesos others can be developed manage port mapping for interactive jobs

Executor Multiple instances can be run to scale with the number of jobs to monitor. Manage kill or reschedule requests Checks the status of the job (running, over) Trigger watchers (see next slide) When job is over, it updates job status.

Watcher Watchers are plugins called by executors during job execution to act upon job life cycle: ex: kill job ex: update some meta-data New plugins can be added Available: Maxlifespanwatcher: kill a job after X days.

Monitoring Cadvisor helps to monitor live job cpu/memory usage. data can be saved in InfluxDB for later analysis. Previous jobs data are kept in MongoDB for statistics/analysis. Prometheus metrics endpoint for global statistics Prometheus can connect to cadvisor for container metrics Python logging: can export to Graylog etc.

Supervision Can register to etcd or consul to monitor processes With consul, possibility to use service discovery with his DNS for automatic load balancing to web servers.

Networking Support CNI (container network interface) for IP per container to avoid port mapping Via Docker plugins Via Mesos Unified Containerizer + CNI plugins

Deployment On premise Vagrant box for testing Docker images Chef and Amazon recipes

Job execution Script is executed in a container with his uid/gid User can ask for home directory and/or shared directories (read only if root) User can write in a per-job directory accessible from a gateway or HTTP Stdout/stderr are saved in job directory Live log of stdout Live resource usage monitoring of job When job is archived (config or manual), per-job directory is deleted

About Authors: Olivier Sallou (IRISA / Univ. Rennes 1) Cyril Monjeaud (IRISA/ Univ. Rennes 1) License: Apache-2.0 Web: http://www.genouest.org/godocker/ Code: https://bitbucket.org/osallou/go-docker Documentation: https://godocker.atlassian.net/wiki/display/god/godo CKER