IBM Data Science Experience (DSX) Partner Application Validation Quick Guide

IBM Data Science Experience (DSX) Partner Application Validation Quick Guide VERSION: 1.6 DATE: Sept 27, 2017 EDITOR: D. Rangarao

Table of Contents 1 Overview of the Application Validation Process... 3 2 Platform Specific Considerations IBM DSX Family... 3 2.1 Introduction... 3 2.2 Architecture... 4 2.3 Scope... 5 3 Resources for IBM DSX Validation... 6 3.1 Assistance... 6 3.2 Platform Access... 6 3.3 Recommended tests... 7 4 Partner Information... 9 4.1 Company Information: (required)... 9 4.2 Contact Information (required)... 9 4.3 Partner Product Overview (required)... 9 4.4 Operating System(s) supported: (required)... 9 4.5 Product Description: (required)... 10 4.6 Application Status and Deployment Topology (optional)... 10 4.7 Support Model (required)... 11 5 Clients Jointly Using our solution (optional)... 11 5.1 Joint Clients number (optional)... 12 5.2 Joint Clients detail (optional)... 12 6 Validation Test details (required)... 12 6.1 IBM DSX Features exercised in testing (required)... 13 6.2 Test results and interoperability issues or bugs discovered (required)... 13 6.3 Performance testing (optional)... 14 6.4 Minimum version number of your application required (required)... 14 7 Email Completed Document:... 14 5/31/17 2

1 Overview of the Application Validation Process Ready for IBM Analytics is the IBM technical validation process to help partners enhance their technology value proposition with IBM products; achieve optimal interoperability; and help ensure client satisfaction. This Application Validation Guide contains essential information about accessing our technology and contains the testing results form that needs to be submitted to the IBM Analytics Team for review. Once approved, you will be able to download and use the Ready for IBM Analytics mark on collateral and be included in the Business Partner Application Showcase. We have some basic tests that we prescribe in section 3.3 below, this validates basic functionality. Application providers should execute additional tests that they currently use for unit/regression testing. We believe that you, the application providers, are the best judge of whether your application is functioning correctly. However, should you need technical assistance from our subject matter experts, we are here to work with you. We expect that in most cases you have done appropriate levels of testing, which are far more suitable for proving application readiness than those we might attempt to provide. We do look to verify and archive this document, when completed, as a record of the process used to achieve the verification that our products work together smoothly. Please be sure to return the form starting in Chapter 4 to IBM Analytics Team. 2 Platform Specific Considerations IBM DSX Family 2.1 Introduction IBM Data Science Experience (DSX) is an interactive, collaborative environment where Data Scientists can use multiple tools to activate their insights. Data Scientists can choose to use the Jupyter-based notebook interface with a choice of runtimes including R, Python and Scala or use the embedded RStudio. IBM DSX is available in different configurations including:- - DSX Desktop (installed on laptop/desktop) - DSX Local (installed in private cloud) - DSX on the public cloud 5/31/17 3

The focus of this document is IBM DSX Local. 2.2 Architecture IBM DSX Local runs on a Kubernetes cluster of servers, comprised of the following components:- Control Plane (Master) - Requires three master nodes to manage the entire cluster - Uses etcd as a key value store that persists the cluster state and stores metadata about cluster service deployment and heatlh. - Uses Prometheus for monitoring and Elk for logging. Storage For data stores and storage management: - Uses GlusterFS for storage management. - Uses IBM Cloudant DB as service meta database. - Uses Redis as the in-memory database. - Uses Swift Oject Store for user artifacts. - Uses Elasticsearch DB for logs. Deciding DSX Local configuration For the best resiliency, you can set up on a minimum of nine nodes: three for Control plane (one for master and two for high availability), three for Storage (one primary and two for high availability), and three for Compute (one primary and two for high availability). Alternatively, you can set up three nodes: one node with Control plane, Storage, and Compute on it, and two extra nodes for high availability. 5/31/17 4

Figure: Architecture for minimum of nine nodes Figure: Architecture for a minimum of three nodes 2.3 Scope This guide is provided for partners who want to validate their application for the IBM DSX Local product, which is deployed as both an on premises server and in private cloud configurations. 5/31/17 5

3 Resources for IBM DSX Validation 3.1 Assistance If you need assistance with the process, please submit an inquiry to the Analytics Ecosystem team and we will promptly contact you to answer any questions you may have. IBM ROLE NAME EMAIL PHONE Analytics Ecosystem -team email address- IBM Analytics Ecosystem Team DSX specialist Deepak Rangarao drangar@us.ibm.com 973-216-8283 3.2 Platform Access IBM DSX Desktop Free Beta Version IBM DSX Desktop is a free client for data scientists and data engineers. It includes Jupyter-based notedbooks to create, import or run code using one of the three runtimes (R, Python, Scala) and RStudio, a tool for statistical analysis and machine learning with R. Link: IBM DSX Desktop Note: In some circumstances partners might be able to leverage the DSX Desktop version to test their integration touch points while they wait to get access to a DSX Local installation. IBM DSX Local Community Edition IBM DSX Local is an on-premise enterprise solution for data scientists and data engineers. It includes Jupyter-based nodebooks to create, import or run code using one of the three runtimes (R, Python, Scala and RStudio a tool for statistical analysis and machine learning with R. Functional richer than IBM DSX Desktop, this includes additional functionality around collaboration. It has the notion of community for sharing of analytic and data assets and Projects and ACL s for access and governance of analytic and data assets. Link: IBM DSX Local 5/31/17 6

Access to IBM labs managed instance of IBM DSX In some circumstances, it may be possible to make use of a managed instance of IBM DSX running at IBM. Access to this is arranged via Analytics Lab Services. Please contact the Technical Contact listed in section 3 above. 3.3 Recommended tests Test Steps Expected Outcome Install your application Libraries/Dependenc ies in DSX Notebooks (Python) Execute the following in a Notebook cell to check for existing libraries!pip list isolated Install your application JAR files in DSX Notebooks (Scala) Execute the following command to install a package/dependency!pip install --user <package_name> Execute the following in a Notebook cell to install custom JAR file %AddJar URL_to_jar_file JAR files can also be loaded from a public Maven repository using the following command (In this example we are including the dependency for org.apache.spark.spark-streamingkafka_2.10) You should see a table view of the packages installed and the versions. You should see a message indicating successful install of package. The JAR file should now be available for use in the Notebook. %AddDeps org.apache.spark sparkstreaming-kafka_2.10 1.1.0 -- transitive 5/31/17 7

Install R packages in RStudio or Notebooks PixieDust Visualization PixieDust is an open source Python helper library that works as an add-on to Jupyter notebooks to improve the user experience of working with data. This includes a package manager to install Spark packages inside a Python notebook and visualization capabilities to visualize Spark objects in different ways including table, charts, maps etc. Execute the following command in a Notebook cell or in RStudio install.packages( URL_TO_PACKA GE ) Or install.packages( PACKAGE_NAM E ) Note: Pixiedust currently works with Spark 1.6 or 2.0 and Python 2.7 or 3.5. Note: Pixiedust currently supports Spark DataFrames, Spark GraphFrames and Pandas DataFrames, with more to come. If you can't wait, write your own today and contribute it back. Execute the following in a Notebook cell to visualize the data using PixieDust #import pixiedust display module from pixiedust.display import * Display(<SPARK_OBJECT_NAME >) The R package should now be ready to use by using the following command library( PACKAGE_NAM E ) You should see the PixieDust visualization in the Notebook output cell and be able to change the type of visualization depending on the type of data. 5/31/17 8

4 Partner Information 4.1 Company Information: (required) Company Name Company Address Company Website 4.2 Contact Information (required) Please provide the name, title and email contact information of all relevant in completion of this Validation: Name Role Email Address Phone 4.3 Partner Product Overview (required) Product Name: Product Version: Product GA Date: Notes: 4.4 Operating System(s) supported: (required) OS Distribution Version 5/31/17 9

4.5 Product Description: (required) Please provide a brief description of your application: 4.6 Application Status and Deployment Topology (optional) Please answer the following questions: Broadly speaking how long has your application been commercially available? Do you intend your application to be: (check all that apply) o available as a traditional on premise offering o available for private cloud deployments? 5/31/17 10

4.7 Support Model (required) If an IBM customer opens a support issue that involves this product, what support channels exist that IBM can engage? Please provide all relevant channels of engagement, including a support portal, phone support, or email support. Please include in this section any additional information that will help with client support, such as available on-line help systems for your application, relevant developer forums, etc. Please indicate if you are a member of TSAnet (YES / NO) 5 Clients Jointly Using our solution (optional) Do you have existing or pending clients using your solution with IBM DSX? Is there a time-critical component to completing this validation? If you are in this category please provide a description of the joint solution. If client anonymity is required, please use generic descriptions like an insurance company or a large bank. If you do not have any existing or prospective joint clients please skip this section. 5/31/17 11

5.1 Joint Clients number (optional) How many joint clients do you believe you have in production today? 5.2 Joint Clients detail (optional) Do you have joint clients who are reference clients for you? Please provide details and whether you would like to use them as evidence of your validation in lieu of new testing. 6 Validation Test details (required) Briefly describe the process used to verify interoperability of your product with IBM DSX. Please specify which configuration and version of IBM DSX you certified on: IBM DSX Desktop version OS: IBM DSX Local version build# OS: 5/31/17 12

6.1 IBM DSX Features exercised in testing (required) Please list the major IBM DSX features which were exercised in your testing. For instance, does your application work in Jupyter notebooks, RStudio, integrates with IBM DSX Value-adds such as PixieDust. 6.2 Test results and interoperability issues or bugs discovered (required) Please describe the results of your testing, in particular, did you find any problems with the interoperation of IBM DSX and your application. Please describe and include PTR numbers if they have been created for IBM to address defects in our products. (Attach any printouts or logs of test results to this form as needed) 5/31/17 13

6.3 Performance testing (optional) Please characterize performance results or findings you would like to highlight, if any. 6.4 Minimum version number of your application required (required) Is there a minimum version number of your product required to interoperate with IBM DSX? 7 Email Completed Document: Please send this completed form to: IBM Analytics Ecosystem Team THANK YOU. 5/31/17 14