IBM Data Science Experience (DSX) Partner Application Validation Quick Guide

Similar documents
IBM Data Science Experience (DSX) Partner Application Validation Quick Guide

IBM Information Governance Catalog (IGC) Partner Application Validation Quick Guide

Db2 Partner Application Verification Quick Guide

Blurring the Line Between Developer and Data Scientist

IBM Big SQL Partner Application Verification Quick Guide

Latest from the Lab: What's New Machine Learning Sam Buhler - Machine Learning Product/Offering Manager

SAS, OPEN SOURCE & VIYA MATT MALCZEWSKI, SAS CANADA

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks

IBM dashdb Local. Using a software-defined environment in a private cloud to enable hybrid data warehousing. Evolving the data warehouse

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

Azure DevOps. Randy Pagels Intelligent Cloud Technical Specialist Great Lakes Region

Introducing Oracle Machine Learning

IBM Advantage: IBM Watson Compare and Comply Element Classification

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

DATACENTER SERVICES DATACENTER

EASILY DEPLOY AND SCALE KUBERNETES WITH RANCHER

Python based Data Science on Cray Platforms Rob Vesse, Alex Heye, Mike Ringenburg - Cray Inc C O M P U T E S T O R E A N A L Y Z E

SAS IS OPEN (FOR BUSINESS) MATT MALCZEWSKI, SAS CANADA

API Connect. Arnauld Desprets - Technical Sale

FROM VSTS TO AZURE DEVOPS

Build a system health check for Db2 using IBM Machine Learning for z/os

M365 Powered Device Proof of Concept Overview

Taking your next integration or BPM project to the cloud WebSphere Integration User Group, 12 July 2012 IBM Hursley

SAS IS OPEN (FOR BUSINESS) MATT MALCZEWSKI, SAS CANADA

This document (including, without limitation, any product roadmap or statement of direction data) illustrates the planned testing, release and

Understanding the latent value in all content

Isolation Forest for Anomaly Detection

SQL Server Machine Learning Marek Chmel & Vladimir Muzny

Transforming IT: From Silos To Services

IBM Bluemix compute capabilities IBM Corporation

The Future of Analytics or The New SQL

OpenStack Summit Half-Day Track

IBM 00M-646. IBM WebSphere Sales Mastery Test v5. Download Full Version :

Rickard Linck Client Technical Professional Core Database and Lifecycle Management Common Analytic Engine Cloud Data Servers On-Premise Data Servers

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Cisco CloudCenter Use Case Summary

Accelerate OpenStack* Together. * OpenStack is a registered trademark of the OpenStack Foundation

ACCELERATE APPLICATION DELIVERY WITH OPENSHIFT. Siamak Sadeghianfar Sr Technical Marketing Manager, April 2016

BEST BIG DATA CERTIFICATIONS

Deploying, Managing and Reusing R Models in an Enterprise Environment

Big Data analytics in insurance

Hitachi Vantara Overview Pentaho 8.0 and 8.1 Roadmap. Pedro Alves

High performance and functionality

Accelerate at DevOps Speed With Openshift v3. Alessandro Vozza & Samuel Terburg Red Hat

Cisco Container Platform

The Evolution of Big Data Platforms and Data Science

HOW TO ENABLE AFFORDABLE ENTERPRISE VIDEO FOR EVERYONE

Cloudline Autonomous Driving Solutions. Accelerating insights through a new generation of Data and Analytics October, 2018

9768: Using RTC's ISPF Client for z/os Code Development

World s Most Secure Government IT Solution

Which compute option is designed for the above scenario? A. OpenWhisk B. Containers C. Virtual Servers D. Cloud Foundry

Cloud I - Introduction

IBM Db2 Event Store Simplifying and Accelerating Storage and Analysis of Fast Data. IBM Db2 Event Store

First Session of the Asia Pacific Information Superhighway Steering Committee, 1 2 November 2017, Dhaka, Bangladesh.

Azure Highlights. Randy Pagels Sr. Developer Technology Specialist US DX Developer Tools - Central Region

Oracle Big Data Discovery

AT&T Flow Designer. Current Environment

Migrate from Netezza Workload Migration

Cisco Unified Computing System Delivering on Cisco's Unified Computing Vision

Good analytics needs good data and that needs good metadata

ODPi and Data Governance Free Your MetaData! October 10, 2018

Get Data, Build Apps and Analyze Data Using IBM Bluemix Data and Analytics (Session 6748)

Gladinet Cloud Enterprise

How to Keep UP Through Digital Transformation with Next-Generation App Development

Enterprise Private Cloud. Fully managed private cloud as a service in your data centre or ours.

COPYRIGHT DATASHEET

M365 Powered Device Proof of Concept

Exploiting IT Log Analytics to Find and Fix Problems Before They Become Outages

Multi-Cloud and Application Centric Modeling, Deployment and Management with Cisco CloudCenter (CliQr)

Building Kubernetes cloud: real world deployment examples, challenges and approaches. Alena Prokharchyk, Rancher Labs

HDInsight > Hadoop. October 12, 2017

70-532: Developing Microsoft Azure Solutions

Trends and challenges Managing the performance of a large-scale network was challenging enough when the infrastructure was fairly static. Now, with Ci

Introduction to Microsoft Flow

An Enchanted World: SAS in an Open Ecosystem

The Materials Data Facility

Accelerate Your Enterprise Private Cloud Initiative

Databricks, an Introduction

We make hybrid cloud deliver the business outcomes you require

Python With Data Science

Shri Vaishnav Vidyapeeth Vishwavidyalaya, Indore Shri Vaishnav Institute of Information Technology

70-532: Developing Microsoft Azure Solutions

Airship A New Open Infrastructure Project for OpenStack

The IBM MobileFirst Platform

SAP API Management and API Business Hub Overview

Cyber Secure Dashboard Cyber Insurance Portfolio Analysis of Risk (CIPAR) Cyber insurance Legal Analytics Database (CLAD)

BlackBerry to Acquire Cylance

Real-life technical decision points in using cloud & container technology:

AssetWise to OpenText PoC Closeout Report

SAP Agile Data Preparation Simplify the Way You Shape Data PUBLIC

5 reasons why choosing Apache Cassandra is planning for a multi-cloud future

Optimizing Data Integration Solutions by Customizing the IBM InfoSphere Information Server Deployment Architecture IBM Redbooks Solution Guide

WEBMETHODS AGILITY FOR THE DIGITAL ENTERPRISE WEBMETHODS. What you can expect from webmethods

Microsoft SharePoint Server 2013 Plan, Configure & Manage

IMAGERY FOR ARCGIS. Manage and Understand Your Imagery. Credit: Image courtesy of DigitalGlobe

ICT Infrastructure for Digital Government Services. GovTech-Government Infrastructure Group (GIG)

Cloud Services. Infrastructure-as-a-Service

Deploying Machine Learning Models in Practice

CONTINUOUS DELIVERY IN THE ORACLE CLOUD

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

Transcription:

IBM Data Science Experience (DSX) Partner Application Validation Quick Guide VERSION: 2.0 DATE: Feb 15, 2018 EDITOR: D. Rangarao

Table of Contents 1 Overview of the Application Validation Process... 3 2 Platform Specific Considerations IBM DSX Family... 3 2.1 Introduction... 3 2.2 Architecture... 4 2.3 Scope... 5 3 Resources for IBM DSX Validation... 5 3.1 Assistance... 5 3.2 Platform Access... 6 3.3 Recommended tests... 6 1. Submit your solution to achieve the Ready for IBM Analytics Badge... 8 5/31/17 2

1 Overview of the Application Validation Process Ready for IBM Analytics is the IBM technical validation process to help partners enhance their technology value proposition with IBM products; achieve optimal interoperability; and help ensure client satisfaction. This Application Validation Guide contains essential information about accessing our technology. Once approved, you will be able to download and use the Ready for IBM Analytics mark on collateral and be included in the Business Partner Application Showcase. We have some basic tests that we prescribe in section 3.3 below, this validates basic functionality. Application providers should execute additional tests that they currently use for unit/regression testing. We believe that you, the application providers, are the best judge of whether your application is functioning correctly. However, should you need technical assistance from our subject matter experts, we are here to work with you. 2 Platform Specific Considerations IBM DSX Family 2.1 Introduction IBM Data Science Experience (DSX) is an interactive, collaborative environment where Data Scientists can use multiple tools to activate their insights. Data Scientists can choose to use the Jupyter-based notebook interface with a choice of runtimes including R, Python and Scala or use the embedded RStudio. IBM DSX is available in different configurations including:- - DSX Desktop (installed on laptop/desktop) - DSX Local (installed in private cloud) - DSX on the public cloud The focus of this document is IBM DSX Local. 5/31/17 3

2.2 Architecture IBM DSX Local runs on a Kubernetes cluster of servers, comprised of the following components:- Control Plane (Master) - Requires three master nodes to manage the entire cluster - Uses etcd as a key value store that persists the cluster state and stores metadata about cluster service deployment and heatlh. - Uses Prometheus for monitoring and Elk for logging. Storage For data stores and storage management: - Uses GlusterFS for storage management. - Uses IBM Cloudant DB as service meta database. - Uses Redis as the in-memory database. - Uses Swift Oject Store for user artifacts. - Uses Elasticsearch DB for logs. Deciding DSX Local configuration For the best resiliency, you can set up on a minimum of nine nodes: three for Control plane (one for master and two for high availability), three for Storage (one primary and two for high availability), and three for Compute (one primary and two for high availability). Alternatively, you can set up three nodes: one node with Control plane, Storage, and Compute on it, and two extra nodes for high availability. Figure: Architecture for minimum of nine nodes 5/31/17 4

Figure: Architecture for a minimum of three nodes 2.3 Scope This guide is provided for partners who want to validate their application for the IBM DSX Local product, which is deployed as both an on premises server and in private cloud configurations. 3 Resources for IBM DSX Validation 3.1 Assistance If you need assistance with the process, please submit an inquiry to the Analytics Ecosystem team and we will promptly contact you to answer any questions you may have. IBM ROLE NAME EMAIL PHONE Analytics Ecosystem -team email address- IBM Analytics Ecosystem Team DSX specialist Deepak Rangarao drangar@us.ibm.com 973-216-8283 5/31/17 5

3.2 Platform Access IBM DSX Desktop Free Beta Version IBM DSX Desktop is a free client for data scientists and data engineers. It includes Jupyter-based notedbooks to create, import or run code using one of the three runtimes (R, Python, Scala) and RStudio, a tool for statistical analysis and machine learning with R. Link: IBM DSX Desktop Note: In some circumstances partners might be able to leverage the DSX Desktop version to test their integration touch points while they wait to get access to a DSX Local installation. IBM DSX Local Community Edition Access to IBM labs managed instance of IBM DSX IBM DSX Local is an on-premise enterprise solution for data scientists and data engineers. It includes Jupyter-based nodebooks to create, import or run code using one of the three runtimes (R, Python, Scala and RStudio a tool for statistical analysis and machine learning with R. Functional richer than IBM DSX Desktop, this includes additional functionality around collaboration. It has the notion of community for sharing of analytic and data assets and Projects and ACL s for access and governance of analytic and data assets. Link: IBM DSX Local In some circumstances, it may be possible to make use of a managed instance of IBM DSX running at IBM. Access to this is arranged via Analytics Lab Services. Please contact the Technical Contact listed in section 3 above. 3.3 Recommended tests Test Steps Expected Outcome Install your Execute the following in a Notebook application cell to check for existing libraries 5/31/17 6

Libraries/Dependenc ies in DSX Notebooks (Python) Install your application JAR files in DSX Notebooks (Scala)!pip list isolated Execute the following command to install a package/dependency!pip install --user <package_name> Execute the following in a Notebook cell to install custom JAR file %AddJar URL_to_jar_file You should see a table view of the packages installed and the versions. You should see a message indicating successful install of package. The JAR file should now be available for use in the Notebook. JAR files can also be loaded from a public Maven repository using the following command (In this example we are including the dependency for org.apache.spark.spark-streamingkafka_2.10) %AddDeps org.apache.spark sparkstreaming-kafka_2.10 1.1.0 -- transitive Install R packages in RStudio or Notebooks Execute the following command in a Notebook cell or in RStudio install.packages( URL_TO_PACKA GE ) Or install.packages( PACKAGE_NAM E ) The R package should now be ready to use by using the following command library( PACKAGE_NAM E ) 5/31/17 7

PixieDust Visualization PixieDust is an open source Python helper library that works as an add-on to Jupyter notebooks to improve the user experience of working with data. This includes a package manager to install Spark packages inside a Python notebook and visualization capabilities to visualize Spark objects in different ways including table, charts, maps etc. Note: Pixiedust currently works with Spark 1.6 or 2.0 and Python 2.7 or 3.5. Note: Pixiedust currently supports Spark DataFrames, Spark GraphFrames and Pandas DataFrames, with more to come. If you can't wait, write your own today and contribute it back. Execute the following in a Notebook cell to visualize the data using PixieDust #import pixiedust display module from pixiedust.display import * Display(<SPARK_OBJECT_NAME >) You should see the PixieDust visualization in the Notebook output cell and be able to change the type of visualization depending on the type of data. 4. Submit your solution to achieve the Ready for IBM Analytics Badge In order to approve your solution validation with IBM's products, we need some information from you. Please go to http://ibm.biz/r4validation, fill in the information, and submit. 5/31/17 8