INSIGHTS@SAS: ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA
AGENDA 09.00 09.15 Intro 09.15 10.30 Analytics using SAS Enterprise Guide Ellen Lokollo 10.45 12.00 Advanced Analytics using SAS Enterprise Miner Rens Feenstra 12.00 13.00 Lunch 13.00 14.15 Advanced programming: to get better performance from your SAS code Alfredo Iglesias Rey 14.30 16.00 ABN AMRO presents Cees Harlaar Project INSPIRE Arthur Usov Dynamic Linear Modelling From Data to Insights Pim Veeger SAS on Linux Leon Ellermeijer SAS Improvements Project 16.00 16.15 Wrap up
INSIGHTS@SAS: ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA
AGENDA Advanced Analytics / Datamining / Machine learning SEMMA Rapid Predictive Modeler Enterprise Miner High Performance Analytics R-integration Text Miner Analytic Lifecycle
ADVANCED ANALYTICS HOW ADVANCED IS ADVANCED?
Machine Learning
ADVANCED ANALYTICS WHAT IS MACHINE LEARNING? Machine learning is a branch of artificial intelligence that automates the building of systems that learn iteratively from data, identify patterns, and predict future results with minimal human intervention. It shares many approaches with other related fields, but it focuses on predictive accuracy rather than interpretability of the model
ADVANCED ANALYTICS MACHINE LEARNING IS NOT A NEW DISCIPLINE Statistics Pattern Recognition Computational Neuroscience Data Science Data Mining AI Databases Machine Learning KDD Graphic from the SAS Data Mining Primer course in 1998
ADVANCED ANALYTICS MACHINE LEARNING INCLUDES A COMPREHENSIVE SET OF METHODS Local search optimization k-means clustering Bayesian networks Gradient boosting Deep Learning Random forests Latest techniques Complex Can be more accurate Decisions Trees Regression Neural Networks Principal components Model Ensembles Traditional Easy-to-explain Often good enough Support vector machines SAS covers the full range from Regression to Deep Learning
ADVANCED ANALYTICS WHY IS MACHINE LEARNING SO IMPORTANT NOW? Data Computing Power Algorithms
ANALYTICS TEXT ANALYTICS Finding treasures in unstructured data like social media or survey tools that could uncover insights about key business challenges FORECASTING Leveraging historical data to drive better insight into proactive decision-making Data Management (Integration, Quality & Governance) OPTIMIZATION DATA MINING/ MACHINE LEARNING Analyze massive amounts of data in order to accurately identify areas likely to produce the most profitable results Mine transaction databases to create models of likely outcomes STATISTICS C op yr i g h t 2 0 1 6, S A S I n s t i t u t e I n c. A l l r i g h t s r es er v e d.
ANALYTICS TEXT ANALYTICS Finding treasures in unstructured data like social media or survey tools that could uncover insights about key business challenges FORECASTING Leveraging historical data to drive better insight into proactive decision-making Data Management (Integration, Quality & Governance) OPTIMIZATION DATA MINING/ MACHINE LEARNING Analyze massive amounts of data in order to accurately identify areas likely to produce the most profitable results Mine transaction databases to create models of likely outcomes STATISTICS C op yr i g h t 2 0 1 6, S A S I n s t i t u t e I n c. A l l r i g h t s r es er v e d.
ADVANCED ANALYTICS WHERE TO START?
ENTERPRISE MINER SEMMA IN ACTION REPEATABLE PROCESS
SAMPLE REORGANIZE YOUR DATA Use Weight or Stratified sampling to balance the dataset Partition data into train, validate and test set Error rate Optimum Validation Set Model complexity Training set
EXPLORE CHECK DATA TO UNDERSTAND VARIABLE VALUES
MODIFY TRANSFORM VARIABLES TO OPTIMIZE RESULTS Transform variables using math function (eg. lognormal) Standardize numeric values in z-scores ( how far from average ) Binning numeric variables (dates into tenures, age into buckets) Remove outliers ( Or it is what you are looking for? ) Group categorical variables into classes Impute missing values
MODEL LIST OF MAIN ALGORITHMS Neural networks Deep Learning Decision trees Random forests Associations and sequence discovery Gradient boosting and bagging Support vector machines Nearest-neighbor mapping k-means clustering Self-organizing maps Local search optimization techniques such as genetic algorithms Regression Expectation maximization Kernel density estimation Multivariate adaptive regression splines Bayesian networks Principal components analysis Singular value decomposition Gaussian mixture models Sequential covering rule building Model Ensembles
ASSESS EVALUATE MODEL RESULTS AND SCORE
SAS ENTERPRISE MINER COMPLETE LIST OF NODES SAMPLE Append Data Partition File Import Filter Merge Sample Input Data EXPLORE Association Cluster Graph Explore Variable Clustering DMDB MultiPlot Market Basket StatExplore Link Analysis Path Analysis Variable Selection SOM/Kohonen MODIFY Drop Impute Interactive Binning Principal Components Replacement Rules Builder Transform Variables Decision Tree AutoNeural Regression Neural Network Partial Least Squares Dmine Regression DM Neural Ensemble Rule Induction Gradient Boosting LARS MBR Two Stage Model Import MODEL Incremental Response Survival Analysis Credit Scoring* TS Correlation TS Data Prep TS Dimension Reduction TS Decomp. TS Similarity TS Exponential Smoothing HP Explore HP Impute HP Regression HP Transform HP Variable Selection HP Neural HP Forest HP Decision Tree HP Data Partition HP GLM HP Cluster HP Prin HP SVM HP BNET Comp ASSESS Cutoff Decisions Model Comparison Score Segment Profile UTILITY Control Point End Groups Start Groups Open Source Integration Reporter Score Code Export Metadata SAS Code Ext Demo Save Data Register Metadata *Requires Credit Scoring for SAS Enterprise Miner Add-on License.
MACHINE LEARNING WHY IS IT SO IMPORTANT NOW? Data Computing Power Algorithms
SAS HIGH-PERFORMANCE SAS PROCESSING DIRECTLY ATTACHED TO YOUR DATA DATA MINING Database/DW SAS HP Data Mining SAS ANALYTICS Client C op yr i g h t 2 0 1 6, S A S I n s t i t u t e I n c. A l l r i g h t s r es er v e d. Hadoop Cluster
SAS ENTERPRISE MINER COMPLETE LIST OF NODES SAMPLE Append Data Partition File Import Filter Merge Sample Input Data EXPLORE Association Cluster Graph Explore Variable Clustering DMDB MultiPlot Market Basket StatExplore Link Analysis Path Analysis Variable Selection SOM/Kohonen MODIFY Drop Impute Interactive Binning Principal Components Replacement Rules Builder Transform Variables Decision Tree AutoNeural Regression Neural Network Partial Least Squares Dmine Regression DM Neural Ensemble Rule Induction Gradient Boosting LARS MBR Two Stage Model Import MODEL Incremental Response Survival Analysis Credit Scoring* TS Correlation TS Data Prep TS Dimension Reduction TS Decomp. TS Similarity TS Exponential Smoothing HP Explore HP Impute HP Regression HP Transform HP Variable Selection HP Neural HP Forest HP Decision Tree HP Data Partition HP GLM HP Cluster HP Prin HP SVM HP BNET Comp ASSESS Cutoff Decisions Model Comparison Score Segment Profile UTILITY Control Point End Groups Start Groups Open Source Integration Reporter Score Code Export Metadata SAS Code Ext Demo Save Data Register Metadata *Requires Credit Scoring for SAS Enterprise Miner Add-on License.
SAS EM 14.1 HP BAYESIAN NETWORK NODE Enables the creation of Bayesian networks. probabilistic graphical model that represents the data and the conditional dependencies via a directed acyclic graph (DAG). Supports the following network structures: Naïve, Tree-Augmented Naïve (TAN), Bayesian network Augmented Naïve (BAN), Parent Child (PC) and Markov Blanket. Enables automatic network model selection. Requires a categorical target variable and categorical or interval (binned) input variables.
SAS EM 13.1 HP SUPPORT VECTOR MACHINE NODE Enables the creation of linear and nonlinear support vector machine models. supervised machine-learning method that is used to perform classification and regression analysis Constructs separating hyperplanes that maximize the margin between two classes. Enables the use a variety of kernels: linear, polynomial, radial basis function, and sigmoid function. The node also provides Interior point and active set optimization methods.
SAS EM 13.2 HP FOREST NODE A forest consists of several decision trees that differ from each other in two ways. First, the training data for a tree is a sample without replacement from all available observations. Second, the input variables that are considered for splitting a node are randomly selected from all available inputs. In other respects, trees in a forest are trained like standard trees. Adds support for a partitioned validation data. HP Forest now performs variable selection using the data partitioned for validation, instead of outof-bag (OOB) data. The HP Forest iteration history plot and table also uses partitioned validation data.
SAS ENTERPRISE MINER HIGH-PERFORMANCE NODES AND PROCEDURES Not only nodes available via the interface Also procedures available via any coding interface
EXAMPLE CASE PREDICT CUSTOMER RESPONSE TO RETAIL MARKETING Current Process High-Performance Process Neural Network Method (1 iteration) DATA EXPLORATION M O D E L D E V E L O P M E N T MODEL DEPLOYMENT Neural Network Method (100 iterations) 5 hours to process model 6 minutes to process model Limited to 1 or 2 modeling methods Model lift of 1.6% Model lift of 3.2% 84 Experiment with multiple modeling methods SECONDS
SAS ENTERPRISE MINER OPEN SOURCE INTEGRATION NODE (R SUPPORT) Allows users to integrate R code (supervised and unsupervised models) inside a SAS Enterprise Miner process flow diagram. Provides flexibility to include R code within a data mining flow, using EM for data prep, R for modeling, and then EM for deployment. Includes R models in model assessment with models generated by SAS Enterprise Miner and in some R-generated PMML cases, corresponding SAS DATA step scoring code.
HTTPS://COMMUNITIES.SAS.COM/T5/SAS-COMMUNITIES- LIBRARY/THE-OPEN-SOURCE-INTEGRATION-NODE- INSTALLATION-CHEAT-SHEET/TA-P/223470
ADVANCED ANALYTICS TEXT MINER ADDON TO ENTERPRISE MINER Discovering and using knowledge which exists in the document collection as a whole Uncovering patterns within the document collection Establishing connections between documents and the terms in the collection as a whole Combining free-form text and quantitative variables to derive information and to make better predictions
SAS TEXT MINER TEXT MINING PROCESS Typical SAS Enterprise Miner text mining process flow Change Text Topic Node Values for Basic Sentiment Text Mining Raw Data Predictive Modeling
SAS TEXT MINER TEXT MINING NODES Users control the Text Miner nodes by modifying their default properties. Part of the Text Parsing node properties Different Parts of Speech Find Entities Multi-word Terms Synonyms Stop or Start List Minimum Number of Documents SVD Resolution Max SVD Dimensions Number of Terms to Display And more!
TEXT CLUSTERS
TEXT TOPICS
TEXT PROFILE
SAS ANALYTICS IN ACTION Discovery Deployment Data
THANK-YOU