Deploying, Managing and Reusing R Models in an Enterprise Environment Making Data Science Accessible to a Wider Audience Lou Bajuk-Yorgan, Sr. Director, Product Management Streaming and Advanced Analytics TIBCO Software
DISCLAIMER During the course of this presentation, TIBCO or its representatives may make forward-looking statements regarding future events, TIBCO s future results or our future financial performance. Although we believe that the expectations reflected in the forward-looking statements contained in this presentation are reasonable, these expectations or any of the forward-looking statements could prove to be incorrect and actual results or financial performance could differ materially from those stated herein. TIBCO could experience factors that could cause actual results or financial performance to differ materially from those contained in any forward-looking statement made in connection with this presentation. TIBCO does not undertake to update any forward-looking statements that may be made from time to time or on its behalf. This document (including, without limitation, any product roadmap or statement of direction data) illustrates the planned testing, release and availability dates for TIBCO products and services. This document is provided for informational purposes only and its contents are subject to change without notice. TIBCO makes no warranties, express or implied, in or relating to this document or any information in it, including, without limitation, that the information is error-free or meets any conditions of merchantability or fitness for a particular purpose. This document may not be reproduced or transmitted in any form or by any means without our prior written permission. The material provided is for informational purposes only, and should not be relied on in making a purchasing decision. The information is not a commitment, promise or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for our products remains at our sole discretion.
Multiple paths from Data to Decision and Action Data Science helps deliver better decisions faster
Scarcity of Data Science Skills General Population Citizen Data Scientists (Analysts, Engineers, Scientists) Data Scientists Number of users Analytical complexity of task and capability of user
Skeptical about Citizen Data Scientists? Citizen Data Scientist: aspire beyond pretty pictures and simplistic dashboards By 2019, citizen data scientists will surpass data scientists in the amount of advanced analysis produced. By 2020, more than 40% of data science tasks will be automated, resulting in increased productivity and broader usage by citizen data scientists. This is the trend. How do we make sure people have the the right tools, to get the right answers? 5 http://www.gartner.com/newsroom/id/3570917
Where does R fit in? Pros Easy prototyping of new models and analysis Huge array of analytic methods available The best method to solve a given problem is likely available Lots of people learning R in university Cons Performance: Not designed for real time or Big Data applications Hard for non-data Scientist to use directly exacerbates the Data Science skills scarcity, by requiring both coding and Data Science knowledge Challenging to deploy, integrate and manage in enterprise applications Performance, commercial support and Intellectual Property concerns Result: Compromises which impact Agility Recode in a new, less agile environment Rewrite, use specialized R packages to solve one problem better 6
TIBCO Analytics Business User and Citizen Data Scientists Data Discovery - Insight TIBCO Spotfire Data Scientist Analytics - Model TIBCO Statistica and TIBCO Enterprise Runtime for R (TERR) Create and publish R models and scripts to Spotfire Library, with authorship, user access control, etc. INSIGHT Embed R models and scripts in Spotfire visualizations for wider use Deploy to the cloud and web-based applications Numerical TERR is a Models commercially-supported, proprietary engine for the R Analytic language, Apps built for high performance MODEL TERR embedded in TIBCO products for native R scripting Statistica provides model governance: authorship, user access control, version tracking, etc. Developer Real time - Action TIBCO Streambase Call embedded models for real time scoring Model deployment via centralized ACTION service, with authorship, user access control, approval to deployment, etc. Update models in live real-time applications Copyright 2000-2017 TIBCO Software Inc.
FIND AND ACT ON "CRITICAL BUSINESS MOMENTS" Deliver proactive customer service Smart cross-sell offers Predict impending equipment failure Real-Time inventory Management Optimize Pricing Prevent Fraud Optimize Routes Anticipate and handle disruptions Critical business moments occur in every facet of enterprise operations, they drive competitive differentiation, customer satisfaction and business success!
#1. Smart Visual Analytics Recommendation driven insights Visual analytics is like a bicycle for your business mind.
TIBCO Spotfire Visual Analytics Smart Recommendation-driven insights Multiple dynamic perspectives no old school single page Fastest in and out of memory data engine for data big and small Rich, multilayer, accurate maps Threaded, searchable conversations with annotations and bookmarks Easy configured process specific analytic applications Over 40 relational, big data, cloud & proprietary sources 10
#2. Numerical Models Analytic Apps
S Point and Click Data Science Contextual, one click calculations make powerful methods easy to use: descriptive stats, similarity, clustering, correlations, fitting, forecast Unique commercial engine for R language TIBCO Enterprise Runtime for R (TERR) Any statistic can be part of Spotfire visual aggregations or expression language Easily leverage the work of your Data Scientists from R, Statistica, SAS, Matlab, Python Access to Machine Learning, Deep Learning platforms TIBCO Community shares data science components
Embedded TERR in Spotfire Write R code directly in Spotfire; TERR executes locally or on server Manage TERR analytics locally or in Server to reuse across community Deploy TERR-powered applications to the web
Draw layers on Spotfire Maps with R/TERR scripts Spotfire TERR Data Function contourlines(x,y)
Power of Embedded Advanced Analytics
TIBCO Spotfire with H2O Integration Example: Predictive Analytics for Manufacturing ( scrap parts as early as possible )
TIBCO Statistica Analytic Apps Comprehensive Stats and Predictive Analytics Simple UX 1000 s of stats, machine and deep learning, Bayesian methods Algorithm marketplaces Azure ML, Algorithmia, Apervita, H2O Open source R, Python, C#, Spark, H2O, CNTK Deep NN Data Blending any data, anywhere Model & Rule Lifecycle Management Create workspace, manage, version control, deploy, embed Citizen Data Scientists scale best practices with Web UI IoT Analytics device and gateway publish, scoring Security & Governance Repeatable, auditable; GXP validation : audit logs, version control Non-traditional data image & audio; text mining; Network Analytics with OrientDB, in-database analytics 17 Copyright 2000-2017 TIBCO Software Inc.
TIBCO Statistica: Highlights Simple UX for Data Scientist Drag-and-drop UI for model + rule creation and deployment Simplified data preparation, mash-up, and ETL Comprehensive palette of math and analytics Machine learning, deep learning, Bayesian methods Business User Image, audio, text, Graph-db In-db and In-memory algorithms Flexible integration with R, Python, Scala, SAS, C++, C#, Java Model & Rule Management and Deployment Data Scientist Metadata repository for model & rule version control, governance, security and audit trail Model version and rule lineage; champion/challenger Model & rule publish and embed everywhere Publish to TIBCO Streambase for streaming analytics on live data feed IoT applications - publish to edge Developer Copyright 2000-2017 TIBCO Software Inc.
#3. Streaming Analytics automation Continuous algorithmic awareness &
Streaming Analytics with Spotfire and TERR LiveView Dashboard Alerting Real Time Visualizations Spotfire Visualization for context, drill down for root cause analysis
Streaming Analytics Low/no code workflows for accessing, transforming and acting on Real Time Data Visual Powerful Scalable Fast Extensible Score R models in Real Time applications using native TERR node (+PMML, SparkML, H2O, etc.) Deploy models via centralized service, with approval before production deployment 21
community.tibco.com Copyright 2000-2017 TIBCO
Spotfire Wiki community.tibco.com Copyright 2000-2017 TIBCO Software Inc.
Spotfire Machine Learning Community Spotfire (R) Data Functions Machine Learning / Deep Learning Gradient Boosting Random Forests Anomaly Detection: Autoencoder Segmentation Propensity Affinity Non-Linear Regression; Decline Curves Modeling & Simulation Genetic Algorithms Optimization
This document (including, without limitation, any product roadmap or statement Copyright of direction 2000-2017 data) illustrates TIBCO the planned Software testing, Inc. release and availability dates for TIBCO products and services. It is
TIBCO Analytics Business User and Citizen Data Scientists Data Discovery - Insight TIBCO Spotfire Data Scientist Analytics - Model TIBCO Statistica and TIBCO Enterprise Runtime for R (TERR) Create and publish R models and scripts to Spotfire Library, with authorship, user access control, etc. INSIGHT Embed R models and scripts in Spotfire visualizations for wider use Deploy to the cloud and web-based applications Numerical TERR is a Models commercially-supported, proprietary engine for the R Analytic language, Apps built for high performance MODEL TERR embedded in TIBCO products for native R scripting Statistica provides model governance: authorship, user access control, version tracking, etc. Developer Real time - Action TIBCO Streambase Call embedded models for real time scoring Model deployment via centralized ACTION service, with authorship, user access control, approval to deployment, etc. Update models in live real-time applications Copyright 2000-2017 TIBCO Software Inc.
Summary More demand than ever for Data Science, with too few skilled Data Scientists Rise of Citizen Data Scientists, who need the right tools, guidance and frameworks Importance of leveraging the work of Data Scientists R is a key part of the solution, if R models can be managed, deployed, embedded, reused, TIBCO Analytics: Easy to embed/leverage/deploy the work of Data Scientists, from R and beyond In Spotfire Visual Applications, used by business users and Citizen Data Scientists In real time applications, to automate decision making Easier for Data Scientists to create and reuse predictive analytics in Statistica While leveraging the best of open source R, Python, etc. Rich community with examples, reusable assets, etc. While maintaining necessary analytic governance and model management