Verteego VDS Documentation

Similar documents
TangeloHub Documentation

Installing and Using Docker Toolbox for Mac OSX and Windows

Bitnami MEAN for Huawei Enterprise Cloud

Bitnami Apache Solr for Huawei Enterprise Cloud

Ansible Tower Quick Setup Guide

Bitnami JRuby for Huawei Enterprise Cloud

DC/OS on Google Compute Engine

USING NGC WITH GOOGLE CLOUD PLATFORM

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Gerrit

Deploying a Production Gateway with Airavata

SAP Vora - AWS Marketplace Production Edition Reference Guide

Bitnami Re:dash for Huawei Enterprise Cloud

At Course Completion Prepares you as per certification requirements for AWS Developer Associate.

Using DC/OS for Continuous Delivery

Bitnami Pimcore for Huawei Enterprise Cloud

Enterprise Steam Installation and Setup

Swift Web Applications on the AWS Cloud

Tunir Documentation. Release Kushal Das

Creating a Yubikey MFA Service in AWS

Preparing Your Google Cloud VM for W4705

HDI+Talena Resources Deployment Guide. J u n e

Bitnami OSQA for Huawei Enterprise Cloud

Bitnami Ruby for Huawei Enterprise Cloud

Azure Marketplace Getting Started Tutorial. Community Edition

Bitnami ez Publish for Huawei Enterprise Cloud

CloudExpo November 2017 Tomer Levi

Testbed-12 TEAM Engine Virtualization User Guide

SCALING DRUPAL TO THE CLOUD WITH DOCKER AND AWS

LENS Server Maintenance Guide JZ 2017/07/28

Advanced Continuous Delivery Strategies for Containerized Applications Using DC/OS

DevOps Course Content

Setting Up U P D AT E D 1 / 3 / 1 6

Gunnery Documentation

LSST software stack and deployment on other architectures. William O Mullane for Andy Connolly with material from Owen Boberg

Azure Marketplace. Getting Started Tutorial. Community Edition

Xcalar Installation Guide

Downloading and installing Db2 Developer Community Edition on Ubuntu Linux Roger E. Sanders Yujing Ke Published on October 24, 2018

Quick Start Guide to Compute Canada Cloud Service

Infoblox Kubernetes1.0.0 IPAM Plugin

How to choose the right approach to analytics and reporting

Exercise #1: ANALYZING SOCIAL MEDIA AND CUSTOMER SENTIMENT WITH APACHE NIFI AND HDP SEARCH INTRODUCTION CONFIGURE AND START SOLR

Kollaborate Server. Installation Guide

Hortonworks SmartSense

Deploying Rubrik Datos IO to Protect MongoDB Database on GCP

Bitnami Coppermine for Huawei Enterprise Cloud

An Overview of the Architecture of Juno: CHPC s New JupyterHub Service By Luan Truong, CHPC, University of Utah

DEPLOYING A 3SCALE API GATEWAY ON RED HAT OPENSHIFT

Inception Cloud User s Guide

Hortonworks DataFlow

SnowAlert Documentation. Snowflake Security

GIT. A free and open source distributed version control system. User Guide. January, Department of Computer Science and Engineering

CIS 612 Advanced Topics in Database Big Data Project Lawrence Ni, Priya Patil, James Tench

Cloudera Manager Quick Start Guide

Bitnami HHVM for Huawei Enterprise Cloud

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

PiranaJS installation guide

Bitnami Node.js for Huawei Enterprise Cloud

Ambari Managed HDF Upgrade

Getting Started with the Google Cloud SDK on ThingsPro 2.0 to Publish Modbus Data and Subscribe to Cloud Services

HPCC Systems: See Through Patterns in Big Data to Find Big Opportunities

Index. Bessel function, 51 Big data, 1. Cloud-based version-control system, 226 Containerization, 30 application, 32 virtualize processes, 30 31

Aware IM Version 8.1 Installation Guide

Puppet on the AWS Cloud

Installing HDF Services on an Existing HDP Cluster

Storage Made Easy Cloud Appliance installation Guide

User Workspace Management

Bitnami MySQL for Huawei Enterprise Cloud

Bitnami MariaDB for Huawei Enterprise Cloud

Installing MediaWiki using VirtualBox

Oracle Big Data Cloud Service, Oracle Storage Cloud Service, Oracle Database Cloud Service

DATA SCIENCE USING SPARK: AN INTRODUCTION

AWS Remote Access VPC Bundle

Oracle Cloud Using Oracle Big Data Manager. Release

LOCAL WALLET (COLD WALLET):

Hypersocket SSO. Lee Painter HYPERSOCKET LIMITED Unit 1, Vision Business Centre, Firth Way, Nottingham, NG6 8GF, United Kingdom. Getting Started Guide

Simplified CICD with Jenkins and Git on the ZeroStack Platform

Setting up a Chaincoin Masternode

Eucalyptus User Console Guide

DEVOPS COURSE CONTENT

OpenStack Havana All-in-One lab on VMware Workstation

Big Data Applications with Spring XD

PVS Deployment in the Cloud. Last Updated: June 17, 2016

CSCI 350 Virtual Machine Setup Guide

The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.

Comodo Endpoint Security Manager Professional Edition Software Version 3.3

DogeCash Masternode Setup Guide Version 1.2 (Ubuntu 16.04)

Contents Overview... 5 Upgrading Primavera Gateway... 7 Using Gateway Configuration Utilities... 9

Prototyping Data Intensive Apps: TrendingTopics.org

SINGLE NODE SETUP APACHE HADOOP

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)

How To Start Mysql Using Linux Command Line Client In Ubuntu

Professional Edition User Guide

Bitnami ProcessMaker Community Edition for Huawei Enterprise Cloud

Homework #7 Google Cloud Platform

Quick Install for Amazon EMR

Oracle Cloud Using Oracle Big Data Manager. Release

iway Big Data Integrator New Features Bulletin and Release Notes

INDIGO PAAS TUTORIAL. ! Marica Antonacci RIA INFN-Bari

CircuitPython with Jupyter Notebooks

Chef Server on the AWS Cloud

Transcription:

Verteego VDS Documentation Release 1.0 Verteego May 31, 2017

Installation 1 Getting started 3 2 Ansible 5 2.1 1. Install Ansible............................................. 5 2.2 2. Clone installation package...................................... 5 2.3 3. Install Verteego DS.......................................... 6 2.4 3. Sign in................................................. 8 2.5 4. Custom settings............................................ 8 3 Docker 11 3.1 1. Install Docker............................................. 11 3.2 2. Clone installation package...................................... 11 3.3 3. Install Verteego DS.......................................... 11 3.4 4. Sign in................................................. 12 4 Dataflow 13 5 Data cleaning 15 6 Analytics & dashboarding 17 7 Notebooks 19 8 Prediction 21 9 Examples 23 10 Contact us 25 11 Community 27 i

ii

Verteego Data Science Suite was built to deliver a plug-and-play environment for data scientists enabling both fast prototyping and scalable production deployments. Verteego s purpose is to provide a bundle of tools covering all common requirements of data scientists. Verteego is not just another big data platform but a best-of-breed mash-up of the most powerful big data and prediction solutions currently available. Installation 1

2 Installation

CHAPTER 1 Getting started Verteego Data Science Suite is aiming to gather the best data science tools into a unique consistent and powerful stack. Installing of Verteego is fast and straight forward. No hassle, no complicated configuration. Follow the installation instructions here. 3

4 Chapter 1. Getting started

CHAPTER 2 Ansible Please note that in the following installation instructions we use 2 placeholders for local directories: VDS_ROOT and SSH_ROOT. Just replace them with the correct directories from your system. VDS_ROOT: local directory where the installation package will be cloned to SSH_ROOT: local.ssh directory 1. Install Ansible We ll use Ansible to deploy Verteego DS to your remote server or Virtualbox. If you don t have Ansible yet, please install it as we ll use Ansible to orchestrate the automatic installation process for you. Linux http://docs.ansible.com/ansible/intro_installation.html#latest-releases-via-apt-ubuntu Mac OS (Not tested) http://docs.ansible.com/ansible/intro_installation.html Windows (Not tested) http://docs.ansible.com/ansible/intro_installation.html 2. Clone installation package Clone the following repository to your local machine (NOT to the remote server on which you want run Verteego DS). We ll call this repository VDS_ROOT in the following. git clone https://github.com/verteego/vds.git 5

3. Install Verteego DS 3.1 Installation on Amazon web service 1. Configure account and fetch needed authentication file Create a key-pair and call it vds: Change the right of the downloaded file (vds.pem) to 400 : chmod 0400 Downloads/vds.pem Copy it to cp Downloads/vds.pem VDS_ROOT/deployment/ansible/files/aws/vds.pem Configure account rights : Create Access/Secret keys : Copy the Access and secret keys into key.json file under VDS_ROOT/deployment/ansible/files/aws/keys.json 2. Launch installation ansible-playbook -i VDS_ROOT/deployment/ansible/hosts --private-key=vds_root/ deployment/ansible/files/aws/vds.pem -u admin VDS_ROOT/deployment/ansible/setup_on_ aws.yml 3.2 Installation on google cloud platform 1. Install Google Cloud SDK Before you start you should make sure that you have a running Google Cloud platform account and the GCloud SDK installed (to install GCloud SDK: https://cloud.google.com/sdk/docs). Configure your account and project gcloud init Generate SSH key for GCloud gcloud compute config-ssh 2. Set up the VDS environment on Google Cloud Create a Google service account : Go to https://console.cloud.google.com/iam-admin/serviceaccounts Select the project into which you want to create the VDS instance Create a service account with project editor role Check the Furnish a new private key option Chose JSON key type When you click the Create button, a key file will be the downloaded. Copy the downloaded key file to VDS_ROOT/deployment/ansible/files/gcp and rename it to ansible.json 6 Chapter 2. Ansible

cp Downloads/ORIGINAL_KEYFILE.json VDS_ROOT/deployment/ansible/files/gcp/ansible.json 3. Install libcloud sudo apt-get install python-pip sudo pip install apache-libcloud==1.5.0 # in case you encounter an ssl certificate validation issue (https://libcloud. readthedocs.io/en/latest/other/ssl-certificate-validation.html#ssl-certificate- validation-in-v2-0) sudo pip install --upgrade certifi 4. Launch installation This will launch the default installation of Verteego Data Suite. For custom settings such as instance calibration, read this. ansible-playbook -i VDS_ROOT/deployment/ansible/hosts --private-key=ssh_root/google_ compute_engine VDS_ROOT/deployment/ansible/setup_gc_instance.yml Be patient, the deployment of all files can take a while depending on the capacity of the instance you ve chosen. 5. Start playing When the installation process has finished, using a browser, navigate to the newly created instance external IP on port 33330 : http://gc_instance_external_ip:33330. You can find the external ip address on on your Google Cloud Compute Engine web page console (https://console. cloud.google.com/compute/instances). 3.3 Installation on a local virtual server (virtualbox) from sources 1. Install Virtualbox and Vagrant Install Virtualbox: https://www.virtualbox.org/wiki/downloads Install Vagrant: https://www.vagrantup.com/docs/installation 2. Launch Vagrant Go to the Vagrant directory (VDS_ROOT/vagrant) and launch Vagrant (this may take a while as it will download a full Debian image to be installed on Virtualbox): cd VDS_ROOT/vagrant vagrant up 3. Installation Launch installation ansible-playbook -i VDS_ROOT/deployment/ansible/hosts --private-key=vds_root/vagrant/. vagrant/machines/vds/virtualbox/private_key VDS_ROOT/deployment/ansible/setup_on_ vbox.yml 4. Start playing Navigate to http://virtualbox_instance_ip:33330 2.3. 3. Install Verteego DS 7

3.4 Installation on a remote virtual private server (vps) Requirements : this playbook is designed to work on a debian 8 distribution, so we assume your VPS to be running a debian 8 you should be able to connect o you VPS using a private key without password you should know your VPS s public ip remote user should be part of group sudoer, because we need sudo privileges to run all commands your server should expose the port range 33330 to 33335, to enable external access to the verteego datasuite. 1. Install VDS # Pay attention to the comma after the VPS_PUBLIC_IP ansible-playbook \ -i 'VPS_PUBLIC_IP,' \ --private-key=path_to_vps_private_ssh_key \ -u REMOTE_USER \ VDS_ROOT/setup_on_vps.yml 2. Start playing Navigate to http://vps_public_ip:33330 3. Sign in For your first sign in you can use the following credentials. For security reasons, remember to change them or delete the default user after your first login. Username: vds-user Password: verteego 4. Custom settings Customize infrastructure settings Your installation can be easily customised using the different.yml files in the VDS_ROOT/deployment/ansible directory. Example: Use a high-memory instance on Google Cloud Open VDS_ROOT/deployment/ansible/setup_gc_instance.yml In the vars:machine_type variable replace n1-standard-1 with n1-highmem-16. (see https://cloud.google.com/compute/docs/machine-types) You can also directly precise specific settings in the command line using the extra-vars parameter while running ansible-playbook. Example : Use a high-memory instance on Google Cloud and deploy instance in a different zone ansible-playbook \n -i VDS_ROOT/deployment/ansible/hosts \n 8 Chapter 2. Ansible

--private-key=ssh_root/google_compute_engine VDS_ROOT/deployment/ansible/setup_gc_ instance.yml --extra-vars "ginstance_type=n1-highmem-16 gzone=us-central1-f" Customize application settings Open VDS_ROOT/deployment/ansible/group_vars/all/vars_file.yml to change the default settings for the different applications composing Verteego Data Suite. 2.5. 4. Custom settings 9

10 Chapter 2. Ansible

CHAPTER 3 Docker 1. Install Docker Head to https://www.docker.com/community-edition#/download and download and install docker for your specfic plateform, docker is supported by Windows, Mac Os, Linux. Similarly head to https://docs.docker.com/compose/install/ and install docker-compose. Once installed you ll have access to docker and docker-compose commands using the command line. docker --version output : Docker version 17.03.1-ce, build c6d412e docker-compose --version output : docker-compose version 1.12.0, build b31ff33 2. Clone installation package git clone https://github.com/verteego/vds.git 3. Install Verteego DS 3.1 using docker-compose on a single machine Using the command line, go to the directory deployment/docker inside the cloned repository from step 2. docker-compose up 11

Navigate to http://localhost:33330 4. Sign in For your first sign in you can use the following credentials. For security reasons, remember to change them or delete the default user after your first login. Username: vds-user Password: verteego 12 Chapter 3. Docker

CHAPTER 4 Dataflow The dataflow represents the backbone of your data science projects. It describes the different steps your data will run through from its source to prediction and visualization. The dataflow module is based on the Apache Nifi technology, one of the most powerful data routing and transformation tools currently available. It offers you a production-ready and highly configurable platform that can be run with a broad panel of data sources, formatting standards (XML, JSON, Avro, CSV,...) and big data technologies (Hadoop, Kafka, HDFS, Flume, Elasticsearch, HBase, Couchbase, Mongo DB, Solr, Splunk, Lumberjack, Cassandra, Hive,...). To learn more about the dataflow technology read on here. 13

14 Chapter 4. Dataflow

CHAPTER 5 Data cleaning The data cleaning module provides a powerful user interface to define a stack of data cleaning tasks that can be applied to any file running through your dataflow. Technology The data cleaning module is powered by Google s Open Refine technology. You ll find a full user documentation here. General Refine Expression Language (GREL) The data cleaning module comes with a powerful expression language allowing to apply precise transformation and cleaning tasks to your data. Check out the GREL documentation Running cleaning tasks in dataflow The dataflow module offers special processors that can be used to apply a cleaning script to a data flow. You may find them amongst the other processors by typing openrefine in the processor search field. Currently there are processors for XLS, XLSX, CSV and TSV support. 15

16 Chapter 5. Data cleaning

CHAPTER 6 Analytics & dashboarding Verteego Data Science Suite offers comprehensive analytical and dashboarding capabilities allowing to slice, dice and visualize your data at any step of your dataflow. Technology The analytics and dashboarding module is powered by Superset, a data exploration platform designed to be visual, intuitive and interactive. To dive deeper into the technology please check out the Superset Github and the user group. Database connections The analytics and dashboarding module can be run with a variety of databases using the SQL alchemy libraries. 17

18 Chapter 6. Analytics & dashboarding

CHAPTER 7 Notebooks Notebooks enable you to write custom scripts that can be run within your dataflow using the ExecuteProcess and ExecuteStreamCommand processors. Verteego comes with a few pre-installed kernels to cover the most common languages used by data scientists: Python 2.7 R Bash Technology The integrated notebooks run on Jupyter. Install additional kernels If you need programming languages that are not pre-installed with your package of Verteego you can find more of them here: List of community supported language kernels 19

20 Chapter 7. Notebooks

CHAPTER 8 Prediction The prediction module offers a simple and powerful way to produce predictive models without writing a line of code. Supported algorithms are: Aggregator Deep Learning Distributed Random Forest Gradient Boosting Machine Generalized Linear Modeling Generalized Low Rank Modeling K-means Naive Bayes Principal Components Analysis Technology Verteego Data Science Suite s predictive technology runs on open source software H2O. You may find the full documentation and some tutorials here. Predictive modelling Models trained on H2O are packaged as Plain Old Java Objects (POJO) that can easily be deployed and executed within your dataflow. Check out some examples of production environments. 21

22 Chapter 8. Prediction

CHAPTER 9 Examples Here are a few examples of what we and our users have built with the Verteego Data Science Suite: Use Case HR Reduce employee attrition and make talents stay longer (Part 1: Data Analysis) Use Case HR Reduce employee attrition and make talents stay longer (Part 2: Prediction) Clean and verify emails All scripts belonging to these examples can be downloaded on the Verteego Github. Prediction deployment examples The H2O doc offers some examples of prediction deployments. 23

24 Chapter 9. Examples

CHAPTER 10 Contact us Need support, detected an issue or have a question? Drop a message to our community support! 25

26 Chapter 10. Contact us

CHAPTER 11 Community Drop a message and we ll do our best to help you out as fast as we can. You can also text us on Twitter. 27