ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA

Similar documents
SAS High-Performance Analytics Products

Data Mining Using SAS Enterprise Miner : A Case Study Approach, Fourth Edition

Enterprise Miner Software: Changes and Enhancements, Release 4.1

ENTERPRISE MINER: 1 DATA EXPLORATION AND VISUALISATION

Getting Started with. SAS Enterprise Miner 5.3

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

What does SAS Enterprise Miner do? For whom is SAS Enterprise Miner designed?

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS

Predictive Modeling with SAS Enterprise Miner

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Contents. Preface to the Second Edition

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

Applying Supervised Learning

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)

Now, Data Mining Is Within Your Reach

Part I: Data Mining Foundations

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

劉介宇 國立台北護理健康大學 護理助產研究所 / 通識教育中心副教授 兼教師發展中心教師評鑑組長 Nov 19, 2012

Random Forest A. Fornaser

CS229 Final Project: Predicting Expected Response Times

Introduction to Data Mining and Data Analytics

Enterprise Miner Version 4.0. Changes and Enhancements

Table Of Contents: xix Foreword to Second Edition

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Correlative Analytic Methods in Large Scale Network Infrastructure Hariharan Krishnaswamy Senior Principal Engineer Dell EMC

Machine Learning Duncan Anderson Managing Director, Willis Towers Watson

Data Science. Data Analyst. Data Scientist. Data Architect

SAS Enterprise Miner 7.1

Overview and Practical Application of Machine Learning in Pricing

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Practical Guidance for Machine Learning Applications

What is machine learning?

BIG DATA SCIENTIST Certification. Big Data Scientist

GETTING STARTED WITH DATA MINING

9. Conclusions. 9.1 Definition KDD

SOCIAL MEDIA MINING. Data Mining Essentials

Contents. Foreword to Second Edition. Acknowledgments About the Authors

Data Mining. Lab Exercises

Intro to Artificial Intelligence

Data Mining Technology Based on Bayesian Network Structure Applied in Learning

End-to-End data mining feature integration, transformation and selection with Datameer Datameer, Inc. All rights reserved.

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV

SAS Enterprise Miner : What does the future hold?

More Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

CS 229 Midterm Review

Using Existing Numerical Libraries on Spark

Unsupervised Learning

The Consequences of Poor Data Quality on Model Accuracy

2. Data Preprocessing

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

Data Science Course Content

Machine Learning in Action

DATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course:

Lecture 9: Support Vector Machines

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

SUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018

Allstate Insurance Claims Severity: A Machine Learning Approach

Performance Evaluation of Various Classification Algorithms

Enterprise Miner Tutorial Notes 2 1

SAS E-MINER: AN OVERVIEW

Oracle9i Data Mining. Data Sheet August 2002

Data Science Bootcamp Curriculum. NYC Data Science Academy

Getting Started with SAS Enterprise Miner 12.1

3. Data Preprocessing. 3.1 Introduction

Predict Outcomes and Reveal Relationships in Categorical Data

Practical Machine Learning Agenda

Chapter 1, Introduction

Python With Data Science

Think & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI)

Machine Learning. Unsupervised Learning. Manfred Huber

Naïve Bayes for text classification

F-SECURE S UNIQUE CAPABILITIES IN DETECTION & RESPONSE

ECS289: Scalable Machine Learning

Data mining overview. Data Mining. Data mining overview. Data mining overview. Data mining overview. Data mining overview 3/24/2014

International Journal of Advance Engineering and Research Development. A Survey on Data Mining Methods and its Applications

Data Preprocessing. Slides by: Shree Jaswal

Chapter 3: Supervised Learning

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning

SAS Enterprise Miner TM 6.1

R (2) Data analysis case study using R for readily available data set using any one machine learning algorithm.

Community edition(open-source) Enterprise edition

Introducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone

MULTIVARIATE ANALYSES WITH fmri DATA

Multi-label classification using rule-based classifier systems

Getting Started with SAS Enterprise Miner 14.2

Pre-Requisites: CS2510. NU Core Designations: AD

Bayesian Network & Anomaly Detection

Gain Insight and Improve Performance with Data Mining

SAS Enterprise Miner : Tutorials and Examples

Unsupervised Learning

Predicting Computing Prices Dynamically Using Machine Learning

Machine Learning Part 1

Outlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013

SAS Visual Data Mining and Machine Learning 8.2: Advanced Topics

Using Machine Learning to Optimize Storage Systems

INTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN...

Transcription:

INSIGHTS@SAS: ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA

AGENDA 09.00 09.15 Intro 09.15 10.30 Analytics using SAS Enterprise Guide Ellen Lokollo 10.45 12.00 Advanced Analytics using SAS Enterprise Miner Rens Feenstra 12.00 13.00 Lunch 13.00 14.15 Advanced programming: to get better performance from your SAS code Alfredo Iglesias Rey 14.30 16.00 ABN AMRO presents Cees Harlaar Project INSPIRE Arthur Usov Dynamic Linear Modelling From Data to Insights Pim Veeger SAS on Linux Leon Ellermeijer SAS Improvements Project 16.00 16.15 Wrap up

INSIGHTS@SAS: ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA

AGENDA Advanced Analytics / Datamining / Machine learning SEMMA Rapid Predictive Modeler Enterprise Miner High Performance Analytics R-integration Text Miner Analytic Lifecycle

ADVANCED ANALYTICS HOW ADVANCED IS ADVANCED?

Machine Learning

ADVANCED ANALYTICS WHAT IS MACHINE LEARNING? Machine learning is a branch of artificial intelligence that automates the building of systems that learn iteratively from data, identify patterns, and predict future results with minimal human intervention. It shares many approaches with other related fields, but it focuses on predictive accuracy rather than interpretability of the model

ADVANCED ANALYTICS MACHINE LEARNING IS NOT A NEW DISCIPLINE Statistics Pattern Recognition Computational Neuroscience Data Science Data Mining AI Databases Machine Learning KDD Graphic from the SAS Data Mining Primer course in 1998

ADVANCED ANALYTICS MACHINE LEARNING INCLUDES A COMPREHENSIVE SET OF METHODS Local search optimization k-means clustering Bayesian networks Gradient boosting Deep Learning Random forests Latest techniques Complex Can be more accurate Decisions Trees Regression Neural Networks Principal components Model Ensembles Traditional Easy-to-explain Often good enough Support vector machines SAS covers the full range from Regression to Deep Learning

ADVANCED ANALYTICS WHY IS MACHINE LEARNING SO IMPORTANT NOW? Data Computing Power Algorithms

ANALYTICS TEXT ANALYTICS Finding treasures in unstructured data like social media or survey tools that could uncover insights about key business challenges FORECASTING Leveraging historical data to drive better insight into proactive decision-making Data Management (Integration, Quality & Governance) OPTIMIZATION DATA MINING/ MACHINE LEARNING Analyze massive amounts of data in order to accurately identify areas likely to produce the most profitable results Mine transaction databases to create models of likely outcomes STATISTICS C op yr i g h t 2 0 1 6, S A S I n s t i t u t e I n c. A l l r i g h t s r es er v e d.

ANALYTICS TEXT ANALYTICS Finding treasures in unstructured data like social media or survey tools that could uncover insights about key business challenges FORECASTING Leveraging historical data to drive better insight into proactive decision-making Data Management (Integration, Quality & Governance) OPTIMIZATION DATA MINING/ MACHINE LEARNING Analyze massive amounts of data in order to accurately identify areas likely to produce the most profitable results Mine transaction databases to create models of likely outcomes STATISTICS C op yr i g h t 2 0 1 6, S A S I n s t i t u t e I n c. A l l r i g h t s r es er v e d.

ADVANCED ANALYTICS WHERE TO START?

ENTERPRISE MINER SEMMA IN ACTION REPEATABLE PROCESS

SAMPLE REORGANIZE YOUR DATA Use Weight or Stratified sampling to balance the dataset Partition data into train, validate and test set Error rate Optimum Validation Set Model complexity Training set

EXPLORE CHECK DATA TO UNDERSTAND VARIABLE VALUES

MODIFY TRANSFORM VARIABLES TO OPTIMIZE RESULTS Transform variables using math function (eg. lognormal) Standardize numeric values in z-scores ( how far from average ) Binning numeric variables (dates into tenures, age into buckets) Remove outliers ( Or it is what you are looking for? ) Group categorical variables into classes Impute missing values

MODEL LIST OF MAIN ALGORITHMS Neural networks Deep Learning Decision trees Random forests Associations and sequence discovery Gradient boosting and bagging Support vector machines Nearest-neighbor mapping k-means clustering Self-organizing maps Local search optimization techniques such as genetic algorithms Regression Expectation maximization Kernel density estimation Multivariate adaptive regression splines Bayesian networks Principal components analysis Singular value decomposition Gaussian mixture models Sequential covering rule building Model Ensembles

ASSESS EVALUATE MODEL RESULTS AND SCORE

SAS ENTERPRISE MINER COMPLETE LIST OF NODES SAMPLE Append Data Partition File Import Filter Merge Sample Input Data EXPLORE Association Cluster Graph Explore Variable Clustering DMDB MultiPlot Market Basket StatExplore Link Analysis Path Analysis Variable Selection SOM/Kohonen MODIFY Drop Impute Interactive Binning Principal Components Replacement Rules Builder Transform Variables Decision Tree AutoNeural Regression Neural Network Partial Least Squares Dmine Regression DM Neural Ensemble Rule Induction Gradient Boosting LARS MBR Two Stage Model Import MODEL Incremental Response Survival Analysis Credit Scoring* TS Correlation TS Data Prep TS Dimension Reduction TS Decomp. TS Similarity TS Exponential Smoothing HP Explore HP Impute HP Regression HP Transform HP Variable Selection HP Neural HP Forest HP Decision Tree HP Data Partition HP GLM HP Cluster HP Prin HP SVM HP BNET Comp ASSESS Cutoff Decisions Model Comparison Score Segment Profile UTILITY Control Point End Groups Start Groups Open Source Integration Reporter Score Code Export Metadata SAS Code Ext Demo Save Data Register Metadata *Requires Credit Scoring for SAS Enterprise Miner Add-on License.

MACHINE LEARNING WHY IS IT SO IMPORTANT NOW? Data Computing Power Algorithms

SAS HIGH-PERFORMANCE SAS PROCESSING DIRECTLY ATTACHED TO YOUR DATA DATA MINING Database/DW SAS HP Data Mining SAS ANALYTICS Client C op yr i g h t 2 0 1 6, S A S I n s t i t u t e I n c. A l l r i g h t s r es er v e d. Hadoop Cluster

SAS ENTERPRISE MINER COMPLETE LIST OF NODES SAMPLE Append Data Partition File Import Filter Merge Sample Input Data EXPLORE Association Cluster Graph Explore Variable Clustering DMDB MultiPlot Market Basket StatExplore Link Analysis Path Analysis Variable Selection SOM/Kohonen MODIFY Drop Impute Interactive Binning Principal Components Replacement Rules Builder Transform Variables Decision Tree AutoNeural Regression Neural Network Partial Least Squares Dmine Regression DM Neural Ensemble Rule Induction Gradient Boosting LARS MBR Two Stage Model Import MODEL Incremental Response Survival Analysis Credit Scoring* TS Correlation TS Data Prep TS Dimension Reduction TS Decomp. TS Similarity TS Exponential Smoothing HP Explore HP Impute HP Regression HP Transform HP Variable Selection HP Neural HP Forest HP Decision Tree HP Data Partition HP GLM HP Cluster HP Prin HP SVM HP BNET Comp ASSESS Cutoff Decisions Model Comparison Score Segment Profile UTILITY Control Point End Groups Start Groups Open Source Integration Reporter Score Code Export Metadata SAS Code Ext Demo Save Data Register Metadata *Requires Credit Scoring for SAS Enterprise Miner Add-on License.

SAS EM 14.1 HP BAYESIAN NETWORK NODE Enables the creation of Bayesian networks. probabilistic graphical model that represents the data and the conditional dependencies via a directed acyclic graph (DAG). Supports the following network structures: Naïve, Tree-Augmented Naïve (TAN), Bayesian network Augmented Naïve (BAN), Parent Child (PC) and Markov Blanket. Enables automatic network model selection. Requires a categorical target variable and categorical or interval (binned) input variables.

SAS EM 13.1 HP SUPPORT VECTOR MACHINE NODE Enables the creation of linear and nonlinear support vector machine models. supervised machine-learning method that is used to perform classification and regression analysis Constructs separating hyperplanes that maximize the margin between two classes. Enables the use a variety of kernels: linear, polynomial, radial basis function, and sigmoid function. The node also provides Interior point and active set optimization methods.

SAS EM 13.2 HP FOREST NODE A forest consists of several decision trees that differ from each other in two ways. First, the training data for a tree is a sample without replacement from all available observations. Second, the input variables that are considered for splitting a node are randomly selected from all available inputs. In other respects, trees in a forest are trained like standard trees. Adds support for a partitioned validation data. HP Forest now performs variable selection using the data partitioned for validation, instead of outof-bag (OOB) data. The HP Forest iteration history plot and table also uses partitioned validation data.

SAS ENTERPRISE MINER HIGH-PERFORMANCE NODES AND PROCEDURES Not only nodes available via the interface Also procedures available via any coding interface

EXAMPLE CASE PREDICT CUSTOMER RESPONSE TO RETAIL MARKETING Current Process High-Performance Process Neural Network Method (1 iteration) DATA EXPLORATION M O D E L D E V E L O P M E N T MODEL DEPLOYMENT Neural Network Method (100 iterations) 5 hours to process model 6 minutes to process model Limited to 1 or 2 modeling methods Model lift of 1.6% Model lift of 3.2% 84 Experiment with multiple modeling methods SECONDS

SAS ENTERPRISE MINER OPEN SOURCE INTEGRATION NODE (R SUPPORT) Allows users to integrate R code (supervised and unsupervised models) inside a SAS Enterprise Miner process flow diagram. Provides flexibility to include R code within a data mining flow, using EM for data prep, R for modeling, and then EM for deployment. Includes R models in model assessment with models generated by SAS Enterprise Miner and in some R-generated PMML cases, corresponding SAS DATA step scoring code.

HTTPS://COMMUNITIES.SAS.COM/T5/SAS-COMMUNITIES- LIBRARY/THE-OPEN-SOURCE-INTEGRATION-NODE- INSTALLATION-CHEAT-SHEET/TA-P/223470

ADVANCED ANALYTICS TEXT MINER ADDON TO ENTERPRISE MINER Discovering and using knowledge which exists in the document collection as a whole Uncovering patterns within the document collection Establishing connections between documents and the terms in the collection as a whole Combining free-form text and quantitative variables to derive information and to make better predictions

SAS TEXT MINER TEXT MINING PROCESS Typical SAS Enterprise Miner text mining process flow Change Text Topic Node Values for Basic Sentiment Text Mining Raw Data Predictive Modeling

SAS TEXT MINER TEXT MINING NODES Users control the Text Miner nodes by modifying their default properties. Part of the Text Parsing node properties Different Parts of Speech Find Entities Multi-word Terms Synonyms Stop or Start List Minimum Number of Documents SVD Resolution Max SVD Dimensions Number of Terms to Display And more!

TEXT CLUSTERS

TEXT TOPICS

TEXT PROFILE

SAS ANALYTICS IN ACTION Discovery Deployment Data

THANK-YOU