SAS Enterprise Miner : What does the future hold?

Similar documents
Enterprise Miner Version 4.0. Changes and Enhancements

Enterprise Miner Software: Changes and Enhancements, Release 4.1

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA

Page 1. Oracle9i OLAP. Agenda. Mary Rehus Sales Consultant Patrick Larkin Vice President, Oracle Consulting. Oracle Corporation. Business Intelligence

SAS Enterprise Miner 7.1

ENTERPRISE MINER: 1 DATA EXPLORATION AND VISUALISATION

Gain Greater Productivity in Enterprise Data Mining

Introducing SAS Model Manager 15.1 for SAS Viya

Developing Applications with Business Intelligence Beans and Oracle9i JDeveloper: Our Experience. IOUG 2003 Paper 406

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS

OLAP Introduction and Overview

to-end Solution Using OWB and JDeveloper to Analyze Your Data Warehouse

Accessibility Features in the SAS Intelligence Platform Products

Certkiller.A QA

Optimizing Your Analytics Life Cycle with SAS & Teradata. Rick Lower

Intelligence Platform

Data Mining Overview. CHAPTER 1 Introduction to SAS Enterprise Miner Software

Types of Data Mining

1 Copyright 2013, Oracle and/or its affiliates. All rights reserved.

What Is SAS? CHAPTER 1 Essential Concepts of Base SAS Software

Enterprise Miner Tutorial Notes 2 1

What s New In Sawmill 8 Why Should I Upgrade To Sawmill 8?

Test On Line: reusing SAS code in WEB applications Author: Carlo Ramella TXT e-solutions

Paper SAS Taming the Rule. Charlotte Crain, Chris Upton, SAS Institute Inc.

CMPUT 695 Fall 2004 Assignment 2 Xelopes

Pre-Requisites: CS2510. NU Core Designations: AD

Using SAS Enterprise Guide with the WIK

SAS offers technology to facilitate working with CDISC standards : the metadata perspective.

The PMBR Procedure. Overview Procedure Syntax PROC PMBR Statement VAR Statement TARGET Statement CLASS Statement. The PMBR Procedure

Oracle Machine Learning Notebook

INTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN...

Data Set. What is Data Mining? Data Mining (Big Data Analytics) Illustrative Applications. What is Knowledge Discovery?

Automatic Detection of Section Membership for SAS Conference Paper Abstract Submissions: A Case Study

Tools to Develop New Linux Applications

Intellicus Getting Started

Enterprise Guide. Delivering Information to the People Who Need to Know. Adrian Bruty Product Manager Business Intelligence SAS EMEA

HYPERION SYSTEM 9 PERFORMANCE SCORECARD

An Interactive GUI Front-End for a Credit Scoring Modeling System by Jeffrey Morrison, Futian Shi, and Timothy Lee

Security and Performance advances with Oracle Big Data SQL

Netezza The Analytics Appliance

Qlik Sense Enterprise architecture and scalability

SAS E-MINER: AN OVERVIEW

SAS STUDIO. JUNE 2014 PRESENTER: MARY HARDING Education SAS Canada. Copyr i g ht 2014, SAS Ins titut e Inc. All rights res er ve d.

Dr. SubraMANI Paramasivam. Think & Work like a Data Scientist with SQL 2016 & R

Liberate, a component-based service orientated reporting architecture

OASUS Spring 2014 Questions and Answers

SAS Enterprise Miner : Tutorials and Examples

RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE

Summary. RapidMiner Project 12/13/2011 RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE

SAS Factory Miner 14.2: User s Guide

Installation and Configuration Instructions. SAS Model Manager API. Overview

Empowering Self-Service Capabilities with Agile Analytics

ORACLE SERVICES FOR APPLICATION MIGRATIONS TO ORACLE HARDWARE INFRASTRUCTURES

An Interactive GUI Front-End for a Credit Scoring Modeling System

EMC Documentum xdb. High-performance native XML database optimized for storing and querying large volumes of XML content

R07. FirstRanker. 7. a) What is text mining? Describe about basic measures for text retrieval. b) Briefly describe document cluster analysis.

Using Java to Front SAS Software: A Detailed Design for Internet Information Delivery

SAS Data Integration Studio 3.3. User s Guide

System Requirements. SAS Profitability Management 2.3. Deployment Options. Supported Operating Systems and Versions. Windows Server Operating Systems

ThinProway A Java client to a SAS application. A successful story. Exactly what you need?

What is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry

What is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry

Specialist ICT Learning

Microsoft SharePoint 2010 The business collaboration platform for the Enterprise and the Web. We have a new pie!

Metadata implementation for a Business Intelligence environment. Yuriy Verbitskiy William Yeoh Andy Koronios

ArcGIS Enterprise: Architecture & Deployment. Anthony Myers

Community edition(open-source) Enterprise edition

SAS System Powers Web Measurement Solution at U S WEST

Oracle9i Data Mining. Data Sheet August 2002

Oracle Mobile Hub. Complete Mobile Platform

SAS Enterprise Miner Performance on IBM System p 570. Jan, Hsian-Fen Tsao Brian Porter Harry Seifert. IBM Corporation

A SAS/AF Application for Parallel Extraction, Transformation, and Scoring of a Very Large Database

SAS Platform Strategy Prepared for FANS usergroup. Mike Frost, Director, Product Management Fiona McNeill, Global Product Marketing

Scoring with Analytic Stores

Web Serving Architectures

SAS Enterprise Miner 14.1

Grid Computing Systems: A Survey and Taxonomy

Oracle Database Competency Center

Oracle 1Z0-515 Exam Questions & Answers

Business Intelligence Roadmap HDT923 Three Days

1 Topic. Image classification using Knime.

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

9. Conclusions. 9.1 Definition KDD

SAS Enterprise Miner: Code Node Tips

Enterprise Client Software for the Windows Platform

SAS Model Manager 2.3

Classification using Weka (Brain, Computation, and Neural Learning)

What s New in VMware vsphere 5.1 VMware vcenter Server

Connecting ESRI to Anything: EAI Solutions

Oracle Application Express: Administration 1-2

Remove complexity in protecting your virtual infrastructure with. IBM Spectrum Protect Plus. Data availability made easy. Overview

docalpha 5.0 Server Configuration Utility User Guide

Architectural Styles I

SAS IT Resource Management Forecasting. Setup Specification Document. A SAS White Paper

Data Virtualization Implementation Methodology and Best Practices

System Requirements. SAS Activity-Based Management 7.2. Deployment

bold The requirements for this software are: Software must be able to build, debug, run, and col ect data from Discrete Event Simulation models

Session Questions and Responses

Transcription:

SAS Enterprise Miner : What does the future hold? David Duling EM Development Director SAS Inc. Sascha Schubert Product Manager Data Mining SAS International

Topics for Discussion: EM 4.2/SAS 9.0 AF/SCL Architecture EM 5.0/SAS 9.1 3-tier Architecture EM Demo of the Alpha EM 5.0 Java UI

EM Two Paths for Two Goals! Evolutionary Development of Data Mining Functionality! Keep up the quality! Upgrade release for current sites! Stay on top of the market! Revolutionary Development of Data Mining Architecture! Address scalability and performance! Address the limitations of current architecture! Make new architecture future-proof Copyright 2002, SAS Institute Inc. All rights reserved.

Time Line Project Mercury + DM Apr 02 Jun 02 Nov 02 Feb 03 SAS V9 EM 4.2 Evolutionary Release EA LA GA SAS V9.1 EM 5.0 Revolutionary Release DP EA LA GA

Goals for EM 4.2! Maintain current product! Fix known defects! Evolve beta tools to production status! Interactive Grouping! Improve scalability (parallel processing)

EM 4.2 Evolve Beta Tools to Production Status! Memory Based Reasoning! DM Neural! Two-Stage Model! Time Series! Link Analysis! J-Score, XML

Interactive Grouping Node! Was developed as part of Credit Scoring Solution! Will be fully integrated in EM 4.2 / 5.0! Used to calculate weights of evidence! also useful for general interactive grouping! Interactive grouping of variables into natural groups in relation to target! now possible for class and interval variables

Publishing Enterprise Miner Models via the Open Meta Server Save Enterprise Miner Register Read HTTP/JSP WWW clients Search Models Retrieve Models Reports Score code Open Meta Server WWW Server

Mining Model Repository! SAS Code, C Code, Java Code! Statistics, Charts, Reports! Input and Output Variables described in XML Process flow report in HTML format Fit and assessment statistics in SAS data sets Cscore code Cscore meta information stored in XML Fit and assessment statistics stored in CSV Target and input data set info stored in text Formats, score, and macro code as SAS code Metadata info about the model in a SAS catalog

Performance and Scalability! XOT! enables parallel input (read) of partitioned data sets)! Using XOT for data I/O! TK (Threaded Kernel)! Multi Threading, making use of multiple CPUs! TK for PROC DMDB, PROC DMINE (Vsel), PROC DMREG! Optional for all listed procedures

Scale-Up Proc DMINE 25 20 Stones (S64) 64 bit Solaris - 8 CPUs Time 15 10 5 XOT-TK Unthreaded 0 2 4 6 8 Number of Threads

Benchmarking TK (Proc DMDB) 100K obs 100 interval vars 100K obs 50 interval vars 50 class vars 100K obs 50 class vars Single Threaded real time 7.77 seconds cpu time 7.77 seconds real time 26.80 seconds cpu time 26.81 seconds real time 22.69 seconds cpu time 22.68 seconds Multi-Threaded (4 Threads) real time 1.95 seconds cpu time 4.82 seconds real time 1.95 seconds cpu time 4.82 seconds real time 12.48 seconds cpu time 29.00 seconds 5M obs 2 interval vars real time 6.50 seconds cpu time 6.50 seconds real time 1.51 seconds cpu time 4.92 seconds

EM 5.0 The Future of Enterprise Miner

Plans for EM 5.0! Create a new 3-tier architecture SAS server - Batch and interactive modes - Use existing tools and expertise Java foundation services - Metadata services - Configuration management Java client - API Integration projects - GUI Swing-based Data Mining from everywhere

Goals for EM 5.0 Create a new EM 5.0! SAS server Batch and interactive modes Use existing tools and expertise! Java middleware Metadata services Configuration management! Java client API Integration projects GUI Swing-based New procedures PATH production ARBOR production (replace split) TAXONOMY experimental SVM experimental Production version of MFC Tree viewer PROC ARBOR IOM procedure interface for interactive training Production Model Repository EM 5.0 model registration EM 4.2 model registration Web GUI Warehouse Admin. Scoring

Current AF / SCL Architecture Project persistence SAS Server Data Persistence SAS Version 8.2 EM 4.x classes SAS Version 8.2 SAS EM Client! SAS AF/SCL Infrastructure! Project Stored Locally on the Windows Client as well as the SAS installation! EM models trained on EM server (single threaded)

Distributed Architecture in EM 5.0 Data Mining Compute Server Project Data Persistence SAS System Metadata Persistence EM 5.0 Java API EM 5.0 Java UI Java EM Client Middleware Server EM 5.0 Java Middlware

Distributed Architecture in EM 5.0 Reporting Project Data Persistence Compute Server SAS System Metadata Persistence EM 5.0 Java API EM 5.0 Java UI Middleware Server EM 5.0 Java Middlware JSP Server SAS Open Metadata Server Web Client

Distributed Architecture in EM 5.0 Warehousing Compute Server Project Data Persistence SAS System Metadata Persistence EM 5.0 Java API EM 5.0 Java UI Middleware Server EM 5.0 Java Middlware JSP Server SAS Open Metadata Server Web Client Data Builder Java Client

EM 5.0 Configuration Options! Stand alone client! SAS Server, Java middleware, GUI on the same machine! Client server! SAS server, Java middleware server, clients connect through Java GUI! Distributed computing! All components on different machines, user connect from anywhere

Reasons for n-tier Architecture Client 1 SAS Server Client 1 SAS Server EM Server Client 2 OMS Client 2 OMS Central administration Easier thin-client deployment Reduce client footprint Offers centralized location for file storage Improved security control of all login processes Easier configuration More persistence options controlled by administrator Better resource monitoring Who s using the system How many processes are running Copyright 2002, SAS Institute Inc. All rights reserved.

New GUI Based on Java Swing! Improved Graphics! Deployed through the web allowing multiple user access! Platform independent! Server independent! Configurable! On-line help! Extendable! XML import/export of diagrams! Start and stop processes

Sample EM 5.0 Results Exploratory Plots Assessment Plots

Interactive Tree Results Viewer

EM5.0 Reporting! SPK=SAS Publish and Subscribe! SAS distributes a package reader! Tables stored as CSV files => activate MS Excel! Can be registered in OMS and Model Repository

Enhanced Performance! Uses MP CONNECT technologies to distribute mining processes across multiple CPUs providing the ability to run nodes in parallel.! DMINE and DMREG procedures have been reengineered to take advantage of the TK and XOT frameworks of V9.! Supports Stop Processing of an EM process.

User 1 User 2 EM 5.0 Performance! GUI sessions get dedicated SAS/IOM workspace Middleware IOM user session: user1 IOM user session: user2 IOM process session: user2 SAS: Train Model 1 SAS Server! Model training gets dedicated SAS/IOM workspace! Parallel branches in process flow run in dedicated SAS/IOM workspaces! xot procedures with spds libname engine start multiple data read threads! tk enabled procedures start multiple computational threads SAS: Train Model 2 tk 1 tk 2 tk 3 tk 4 Server Operating System CPU CPU CPU CPU Event Threads Total User 1 Connects 1 1 User 2 Connects 1 2 User 2 Starts process 1 3 User 2 Disconnects -1 2 Process starts model 1 training 1 3 Process starts model 2 training 1 4 Model 2 starts four threads running 4 8 Model 2 completes -4 4 Process completes -3 1 User 2 Reconnects 1 2

EM 5.0 Batch Processing! Java API/UI for batch processing Runs in middleware Opens existing workspace and starts training process Loads XML diagram files! XML files API Save entire diagrams as XML files Mail from one user to another Scheduled execution %EM5(xmlfile=) macro for running diagrams!data set API Nodes data set: all nodes and properties Connections data set: flow of logic from one node to another Actions data set: nodes and actions to perform on nodes Workspace data set: library and files locations Variables meta data sets: input, target, rejected, etc %EM5(nodes=,connect=, ) macro for running diagrams

EM 5.0 Batch Processing! Compatible with all EM5 file structures! Run the same diagram from UI or batch! Automate model training from diagrams built in the GUI! All SAS language capabilities! Encapsulates EM processing! BATCH.SAS always created for every node! Automate creation of new diagrams! Distribute diagrams! Consulting: initial setup and delivery! May include results, or not

EM 5.0 Batch Processing! API to Allow Java Programs to Call EM! String ids_id=myworkspace.addnode( Datasource );! String reg_id=myworkspace.addnode( Regression );! myworkspace.connectnode(ids_id,reg_id);! myworkspace.runnode(reg_id);

Integrated with OMS and Data Builder! OMS persists metadata about SAS servers, EM project locations, results packages, and data dictionaries for training tables! Scoring processes as well as input/output data sets can be defined and exchanged with other SAS companion products through registration of EM metadata and processes within the SAS OMR.

Other Major Enhancements! New Mining Algorithms:! Support Vector Machines popular algorithm for general classification problems! Web Path Analysis provides efficient and scalable mining of frequent paths from click-stream data.! Taxonomy supports hierarchical associations to populate rules at different levels in the hierarchy.! Improved decision tree algorithm to enable interactive training on the server and provide improved performance of disk resident data.

New Procedures! PROC PATH! PROC SVM! PROC ARBOR! PROX TAXONOMY

New Path node (production)! PROC PATH - a new procedure to mine frequent paths from preprocessed click stream data! Features:! Efficient, scalable and fast! Path completion - Reintroduce missing requests (e.g., back button clicks)! Detecting path breaks - Identify separate subpaths! Generating longest contiguous sub-paths! Correctly handling page reload requests

Path Analysis! Improved customer experience! Tuning web-site structure based on browsing patterns! Build customer relationships! Customizing content at individual or segment level! Real-time target marketing! Cross-sell, up-sell product recommendations! Ad/Rebate placement! Predict site abandonment! Browsing behavior as input to predictive modeling! Segmentation based on browsing behavior

Support Vector Machines (experimental)! Supervised learning tool for creating functions from a set of labeled training data! A binary classifier! A general regression function! Applications! Suitable for general classification problems! Text Categorization! Biosequence Analysis; Micro Arrays

SVM Classification is achieved by a linear or nonlinear separating surface in the input space of the dataset.! Linear SVMs operate by finding a hypersurface in the space of possible inputs. This hypersurface will attempt to split the positive examples from the negative examples. The split will be chosen to have the largest distance from the hypersurface to the nearest of the positive and negative examples.! If the training examples are not linearly separable, SVMs work by mapping the training data into a higher dimension feature space using an appropriate kernel function.

Other new Nodes/Procedures! Taxonomy Hierarchical associations (exp)! ARBOR Replacement for SPLIT.! Support client/server interactive training As an interactive procedure As an engine for a client side Windows Application! Improved performance of disk-resident data! Documented at the level of SAS/STAT procedures! All procedures will use a dynamic DMDB! No permanent physical DMDB data set is created

Early Adopters for EM 5! Looking for Early Adopters in SeUGI time frame! 5 20 sites worldwide recommended from local offices! Different regions and different industries! Following scenarios

Early Adopters for EM 5! Following scenarios desired! distribute the EM Java thin client to multiple users that are geographically dispersed to test 3-tier architecture! small to medium sized firm to evaluate EM 5.0 running entirely on a local client! site to test Java API to integrate EM analytics and scoring services into site specific mining applications! site to test EM analytical deployment test Model Repository! sites with excellent statistical/ai modeling skills and applications to evaluate the new algorithms (SVM, Path analysis node, Interactive Tree, Hierarchical Associations)

EM 5.0 Summary! Delivered as a modern, distributed client-server system for data mining! Enables wide area collaboration on data mining projects and extensive integration opportunities! SAS server uses new parallel and multi-processing features of the SAS V9.0 system and includes an API for running data mining processes and for adding new data mining tools.! Java middleware manages SAS server sessions, user identity, metadata, and report delivery.! Data mining sessions can be created and managed through a Java API.! The user interface is based on Java Swing libraries containing advanced graphics and visualization techniques! New mining algorithms

EM Summary! Provide renowned data mining functionality based on modern future-proof architecture! Clear differentiation between data processing, meta data management and flexible user interface! Architecture open for integration with other SAS and 3 rd party applications! Ensure backward compatibility by parallel maintenance of traditional AF solution

Other Data Mining Presentations at SeUGI! Wed, 16:25, TKC Distributed Data Mining with SAS Enterprise Miner! Wed, 11:40, Analytical Expertise stream, SAS Text Miner! Wed, 17:05, TKC, SAS Text Mining! Analytical Demo Station in TKC

DEMO