Types of Data Mining

Size: px
Start display at page:

Download "Types of Data Mining"

Transcription

1 Data Mining and The Use of SAS to Deploy Scoring Rules South Central SAS Users Group Conference Neil Fleming, Ph.D., ASQ CQE November 7-9, W Systems Co., Inc Types of Data Mining Supervised Classification (target): Logistic regression (discrete outcome) Multiple regression (continuous outcome) Decision trees (discrete outcome) Regression trees (continuous outcome) Neural Nets (discrete and continuous outcomes) Unsupervised Classification (no target) Cluster analysis (K-Means, hierarchal, etc.) Self-Organized maps (SOMS) 147

2 The Goal: Prediction Versus Explanation What type of action will be taken? Regression: Explanation & Prediction Decision trees: Explanation & Prediction Neural Nets: Prediction Decision Trees Finds variables at different levels to best: Maximize hetergeneity between groups Maximize homogeneity within groups Non-linear (interaction) Merges categories that are the same (no statistically significant difference) Discretizes continuous variables (preserving ordinality) Uses missing data 148

3 Picking a Tool Subsidiary of Forrester Research, Inc. examined four data mining products: 1) SAS Enterprise Miner (EM) 2) SPSS Clementine 3) IBM DB2 Intelligent Miner (IM) 4) Oracle Data Mining (ODM) Decision Tree Deliverables Segments data into terminal nodes Provides profiles for explanation &prediction Creates rules for scoring (prediction) 149

4 Decision Tree Algorithms Goals & Methods CHAID (Chi-Square Automatic Interaction Detection) CART (Classification & Regression Trees) Quest Picking the Best Tree Training, Testing, and Validation Cross-Validation with Hold-out samples Metrics: Gains Tables (ROI) & Classification Error 150

5 SAS: Data Mining Leader SAS was chosen as the leader in functionality for: architecture, algorithms, and data access SPSS was chosen as the leader in usability collaboration between statisticians, data preparers, and business analysts. SAS was chosen as the leader in support, with a slight edge over SPSS IBM was noted for its in data-base modeling & deployment of scoring PRICE of Server Version Initial and Renewal (lowest range) SAS EM:$119K/$39K with Base SAS & SAS/STAT needed SPSS Clementine: $75K IBM DB2 IM: $18,750/$3,750 (probably as add-on)through Data Warehouse Standard Edition which includes many other products Oracle ODM: $20K/CPU with different percentages for perpetual licenses 151

6 My company is not a Fortune 100. Another Solution Dedicated software for decision tree modeling 152

7 Node Node 3 Node Node 4 Node Node 5 Node Node 6 Gain Summary by Node Target variable: Has Amex card Target category: Yes Statistics Nodes Node: n Node: % Gain: n Resp: % Index (%) Total Nodes Node:% Gain(%)

8 Gain Summary - In Deciles Target variable: Has Amex card Target category: Yes Statistics Nodes Percentile Percentile: n Gain: n Gain (%) Resp: % ; ; SQL Rules /* Node 3*/ UPDATE <TABLE> SET nod_001 = 3, pre_001 = 0, prb_001 = WHERE ((PAY_WEEK IS NULL) OR (PAY_WEEK <= 1)) AND ((CLASS IS NULL) OR (CLASS <= 3)); /* Node 4*/ UPDATE <TABLE> SET nod_001 = 4, pre_001 = 0, prb_001 = WHERE ((PAY_WEEK IS NULL) OR (PAY_WEEK <= 1)) AND (NOT(CLASS IS NULL) AND (CLASS > 3)); 154

9 Continued /* Node 5*/ UPDATE <TABLE> SET nod_001 = 5, pre_001 = 1, prb_001 = WHERE (NOT(PAY_WEEK IS NULL) AND (PAY_WEEK > 1)) AND ((AGE IS NULL) OR (AGE <= 2)); /* Node 6*/ UPDATE <TABLE> SET nod_001 = 6, pre_001 = 0, prb_001 = WHERE (NOT(PAY_WEEK IS NULL) AND (PAY_WEEK > 1)) AND (NOT(AGE IS NULL) AND (AGE > 2)); Gains % Chart Based on Deciles 155

10 Misclassification Matrix Actual Category No Yes Total Predicted Category No Yes Total Risk Statistics Risk Estimate = (95+47)/323 SE of Risk Estimate = Sqrt[(.45*(1-.45))/323] SAS Log libname in 'e:/notsug'; NOTE: Libref IN was successfully assigned as follows: Engine: V8 Physical Name: e:\notsug 356 %let dsn=credit; Data Assign; SYMBOLGEN: Macro variable DSN resolves to Credit 359 Set in.&dsn; /*SAS Data set coming in to be segmented*/; 360 nod_001=.; 361 pre_001=.; 362 prb_001=.; NOTE: There were 323 observations read from the data set IN.CREDIT. NOTE: The data set WORK.ASSIGN has 323 observations and 8 variables. NOTE: DATA statement used: real time 0.04 seconds cpu time 0.04 seconds 156

11 Proc SQL; /* Node 3*/ 366 UPDATE Assign 367 SET nod_001 = 3, pre_001 = 0, prb_001 = WHERE ((PAY_WEEK IS NULL) OR (PAY_WEEK <= 1)) AND ((CLASS IS NULL) OR (CLASS <= 3)); NOTE: 86 rows were updated in WORK.ASSIGN /* Node 4*/ 371 UPDATE Assign 372 SET nod_001 = 4, pre_001 = 0, prb_001 = WHERE ((PAY_WEEK IS NULL) OR (PAY_WEEK <= 1)) AND (NOT(CLASS IS NULL) AND (CLASS > 3)); NOTE: 79 rows were updated in WORK.ASSIGN /* Node 5*/ 376 UPDATE Assign 377 SET nod_001 = 5, pre_001 = 1, prb_001 = WHERE (NOT(PAY_WEEK IS NULL) AND (PAY_WEEK > 1)) AND ((AGE IS NULL) OR (AGE <= 2)); NOTE: 108 rows were updated in WORK.ASSIGN /* Node 6*/ 381 UPDATE Assign 382 SET nod_001 = 6, pre_001 = 0, prb_001 = WHERE (NOT(PAY_WEEK IS NULL) AND (PAY_WEEK > 1)) AND (NOT(AGE IS NULL) AND (AGE > 2)); NOTE: 50 rows were updated in WORK.ASSIGN. 384 NOTE: PROCEDURE SQL used: real time cpu time 0.19 seconds 0.19 seconds 157

12 385 Data Assign; 386 Set Assign; 387 If prb_001=. then Prob=0; 388 else If pre_001=0 then Prob=1-prb_001; 389 Else if pre_001=1 then Prob=prb_001; 390 /* This assigns the Probability for Target Outcome 1 */; 391 Run; NOTE: There were 323 observations read from the data set WORK.ASSIGN. NOTE: The data set WORK.ASSIGN has 323 observations and 9 variables. NOTE: DATA statement used: real time 0.05 seconds cpu time 0.05 seconds proc summary data=assign; 394 class nod_001; 395 var Prob;output out=statb mean=mean_prob sum=sum_prob; 396 run; NOTE: There were 323 observations read from the data set WORK.ASSIGN. NOTE: The data set WORK.STATB has 5 observations and 5 variables. Analysis of Credit Card Data 10:32 Monday, April 5, 2004 Segments with Active Cards Dsn=Credit Obs nod_001 _TYPE FREQ_ mean_prob sum_prob

13 Conclusion Use Dedicated Software product that is affordable Combine with SAS SQL for Deploying Scoring Rules Create powerful application for Data Mining Provide explanation that is ACTIONABLE with prediction 159

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

INTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN...

INTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN... INTRODUCTION... 2 WHAT IS DATA MINING?... 2 HOW TO ACHIEVE DATA MINING... 2 THE ROLE OF DARWIN... 3 FEATURES OF DARWIN... 4 USER FRIENDLY... 4 SCALABILITY... 6 VISUALIZATION... 8 FUNCTIONALITY... 10 Data

More information

Enterprise Miner Tutorial Notes 2 1

Enterprise Miner Tutorial Notes 2 1 Enterprise Miner Tutorial Notes 2 1 ECT7110 E-Commerce Data Mining Techniques Tutorial 2 How to Join Table in Enterprise Miner e.g. we need to join the following two tables: Join1 Join 2 ID Name Gender

More information

Decision Making Procedure: Applications of IBM SPSS Cluster Analysis and Decision Tree

Decision Making Procedure: Applications of IBM SPSS Cluster Analysis and Decision Tree World Applied Sciences Journal 21 (8): 1207-1212, 2013 ISSN 1818-4952 IDOSI Publications, 2013 DOI: 10.5829/idosi.wasj.2013.21.8.2913 Decision Making Procedure: Applications of IBM SPSS Cluster Analysis

More information

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS Topics AGENDA Challenges with Big Data Analytics How SAS can help you to minimize time to value with

More information

9. Conclusions. 9.1 Definition KDD

9. Conclusions. 9.1 Definition KDD 9. Conclusions Contents of this Chapter 9.1 Course review 9.2 State-of-the-art in KDD 9.3 KDD challenges SFU, CMPT 740, 03-3, Martin Ester 419 9.1 Definition KDD [Fayyad, Piatetsky-Shapiro & Smyth 96]

More information

Data Mining: STATISTICA

Data Mining: STATISTICA Outline Data Mining: STATISTICA Prepare the data Classification and regression (C & R, ANN) Clustering Association rules Graphic user interface Prepare the Data Statistica can read from Excel,.txt and

More information

SAS Enterprise Miner : What does the future hold?

SAS Enterprise Miner : What does the future hold? SAS Enterprise Miner : What does the future hold? David Duling EM Development Director SAS Inc. Sascha Schubert Product Manager Data Mining SAS International Topics for Discussion: EM 4.2/SAS 9.0 AF/SCL

More information

What is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry

What is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry Data Mining Andrew Kusiak Intelligent Systems Laboratory 2139 Seamans Center The University of Iowa Iowa City, IA 52242-1527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Tel. 319-335 5934

More information

Data Set. What is Data Mining? Data Mining (Big Data Analytics) Illustrative Applications. What is Knowledge Discovery?

Data Set. What is Data Mining? Data Mining (Big Data Analytics) Illustrative Applications. What is Knowledge Discovery? Data Mining (Big Data Analytics) Andrew Kusiak Intelligent Systems Laboratory 2139 Seamans Center The University of Iowa Iowa City, IA 52242-1527 andrew-kusiak@uiowa.edu http://user.engineering.uiowa.edu/~ankusiak/

More information

Gain Greater Productivity in Enterprise Data Mining

Gain Greater Productivity in Enterprise Data Mining Clementine 9.0 Specifications Gain Greater Productivity in Enterprise Data Mining Discover patterns and associations in your organization s data and make decisions that lead to significant, measurable

More information

Outline. Prepare the data Classification and regression Clustering Association rules Graphic user interface

Outline. Prepare the data Classification and regression Clustering Association rules Graphic user interface Data Mining: i STATISTICA Outline Prepare the data Classification and regression Clustering Association rules Graphic user interface 1 Prepare the Data Statistica can read from Excel,.txt and many other

More information

ENTERPRISE MINER: 1 DATA EXPLORATION AND VISUALISATION

ENTERPRISE MINER: 1 DATA EXPLORATION AND VISUALISATION ENTERPRISE MINER: 1 DATA EXPLORATION AND VISUALISATION JOZEF MOFFAT, ANALYTICS & INNOVATION PRACTICE, SAS UK 10, MAY 2016 DATA EXPLORATION AND VISUALISATION AGENDA SAS Webinar 10th May 2016 at 10:00 AM

More information

From Building Better Models with JMP Pro. Full book available for purchase here.

From Building Better Models with JMP Pro. Full book available for purchase here. From Building Better Models with JMP Pro. Full book available for purchase here. Contents Acknowledgments... ix About This Book... xi About These Authors... xiii Part 1 Introduction... 1 Chapter 1 Introduction...

More information

Data Science. Data Analyst. Data Scientist. Data Architect

Data Science. Data Analyst. Data Scientist. Data Architect Data Science Data Analyst Data Analysis in Excel Programming in R Introduction to Python/SQL/Tableau Data Visualization in R / Tableau Exploratory Data Analysis Data Scientist Inferential Statistics &

More information

Now, Data Mining Is Within Your Reach

Now, Data Mining Is Within Your Reach Clementine Desktop Specifications Now, Data Mining Is Within Your Reach Data mining delivers significant, measurable value. By uncovering previously unknown patterns and connections in data, data mining

More information

SAS Enterprise Miner: Code Node Tips

SAS Enterprise Miner: Code Node Tips SAS Enterprise Miner: Code Node Tips October 16, 2013 Lorne Rothman, PhD, PStat, Principal Statistician Lorne.Rothman@sas.com SAS Institute (Canada) Inc. Copyright 2010 SAS Institute Inc. All rights reserved.

More information

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda Agenda Oracle9i Warehouse Review Dulcian, Inc. Oracle9i Server OLAP Server Analytical SQL Mining ETL Infrastructure 9i Warehouse Builder Oracle 9i Server Overview E-Business Intelligence Platform 9i Server:

More information

Data Mining Overview. CHAPTER 1 Introduction to SAS Enterprise Miner Software

Data Mining Overview. CHAPTER 1 Introduction to SAS Enterprise Miner Software 1 CHAPTER 1 Introduction to SAS Enterprise Miner Software Data Mining Overview 1 Layout of the SAS Enterprise Miner Window 2 Using the Application Main Menus 3 Using the Toolbox 8 Using the Pop-Up Menus

More information

Gain Insight and Improve Performance with Data Mining

Gain Insight and Improve Performance with Data Mining Clementine 11.0 Specifications Gain Insight and Improve Performance with Data Mining Data mining provides organizations with a clearer view of current conditions and deeper insight into future events.

More information

Benchmarking Spark ML using BigBench. Sweta Singh TPCTC 2016

Benchmarking Spark ML using BigBench. Sweta Singh TPCTC 2016 Benchmarking Spark ML using BigBench Sweta Singh singhswe@us.ibm.com TPCTC 2016 Motivation Study the performance of Machine Learning use cases on large data warehouses in context of assessing Alternate

More information

> Data Mining Overview with Clementine

> Data Mining Overview with Clementine > Data Mining Overview with Clementine This two-day course introduces you to the major steps of the data mining process. The course goal is for you to be able to begin planning or evaluate your firm s

More information

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple

More information

SAS Enterprise Miner : Tutorials and Examples

SAS Enterprise Miner : Tutorials and Examples SAS Enterprise Miner : Tutorials and Examples SAS Documentation February 13, 2018 The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2017. SAS Enterprise Miner : Tutorials

More information

Oracle Machine Learning Notebook

Oracle Machine Learning Notebook Oracle Machine Learning Notebook Included in Autonomous Data Warehouse Cloud Charlie Berger, MS Engineering, MBA Sr. Director Product Management, Machine Learning, AI and Cognitive Analytics charlie.berger@oracle.com

More information

What is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry

What is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry Data Mining Andrew Kusiak Intelligent Systems Laboratory 2139 Seamans Center The University it of Iowa Iowa City, IA 52242-1527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Tel. 319-335

More information

SAS Training BASE SAS CONCEPTS BASE SAS:

SAS Training BASE SAS CONCEPTS BASE SAS: SAS Training BASE SAS CONCEPTS BASE SAS: Dataset concept and creating a dataset from internal data Capturing data from external files (txt, CSV and tab) Capturing Non-Standard data (date, time and amounts)

More information

Oracle9i Data Mining. Data Sheet August 2002

Oracle9i Data Mining. Data Sheet August 2002 Oracle9i Data Mining Data Sheet August 2002 Oracle9i Data Mining enables companies to build integrated business intelligence applications. Using data mining functionality embedded in the Oracle9i Database,

More information

GETTING STARTED WITH DATA MINING

GETTING STARTED WITH DATA MINING GETTING STARTED WITH DATA MINING Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIR Forum 2017 Washington, D.C. 1 Using Data

More information

Predict Outcomes and Reveal Relationships in Categorical Data

Predict Outcomes and Reveal Relationships in Categorical Data PASW Categories 18 Specifications Predict Outcomes and Reveal Relationships in Categorical Data Unleash the full potential of your data through predictive analysis, statistical learning, perceptual mapping,

More information

SAS E-MINER: AN OVERVIEW

SAS E-MINER: AN OVERVIEW SAS E-MINER: AN OVERVIEW Samir Farooqi, R.S. Tomar and R.K. Saini I.A.S.R.I., Library Avenue, Pusa, New Delhi 110 012 Samir@iasri.res.in; tomar@iasri.res.in; saini@iasri.res.in Introduction SAS Enterprise

More information

Enterprise Miner Software: Changes and Enhancements, Release 4.1

Enterprise Miner Software: Changes and Enhancements, Release 4.1 Enterprise Miner Software: Changes and Enhancements, Release 4.1 The correct bibliographic citation for this manual is as follows: SAS Institute Inc., Enterprise Miner TM Software: Changes and Enhancements,

More information

The DMSPLIT Procedure

The DMSPLIT Procedure The DMSPLIT Procedure The DMSPLIT Procedure Overview Procedure Syntax PROC DMSPLIT Statement FREQ Statement TARGET Statement VARIABLE Statement WEIGHT Statement Details Examples Example 1: Creating a Decision

More information

The Data Science Process. Polong Lin Big Data University Leader & Data Scientist IBM

The Data Science Process. Polong Lin Big Data University Leader & Data Scientist IBM The Data Science Process Polong Lin Big Data University Leader & Data Scientist IBM polong@ca.ibm.com Every day, we create 2.5 quintillion bytes of data so much that 90% of the data in the world today

More information

Implementing SVM in an RDBMS: Improved Scalability and Usability. Joseph Yarmus, Boriana Milenova, Marcos M. Campos Data Mining Technologies Oracle

Implementing SVM in an RDBMS: Improved Scalability and Usability. Joseph Yarmus, Boriana Milenova, Marcos M. Campos Data Mining Technologies Oracle Implementing SVM in an RDBMS: Improved Scalability and Usability Joseph Yarmus, Boriana Milenova, Marcos M. Campos Data Mining Technologies Oracle Overview Oracle RDBMS resources leveraged by data mining

More information

Doing the Data Science Dance

Doing the Data Science Dance Doing the Data Science Dance Dean Abbott Abbott Analytics, SmarterHQ KNIME Fall Summit 2018 Email: dean@abbottanalytics.com Twitter: @deanabb 1 Data Science vs. Other Labels 2 Google Trends 3 Abbott Analytics,

More information

The Data Mining usage in Production System Management

The Data Mining usage in Production System Management The Data Mining usage in Production System Management Pavel Vazan, Pavol Tanuska, Michal Kebisek Abstract The paper gives the pilot results of the project that is oriented on the use of data mining techniques

More information

Terabyte-class data analysis for CRM in service provider

Terabyte-class data analysis for CRM in service provider Terabyte-class data analysis for CRM in service provider NTT COMWARE CORPORATION Ryo Mukae ( mukae.ryo ryo@nttcom.co..co.jp ) NTT COMWARE CORPORATION Achievement in building the fundamental operating system

More information

Optimizing Your Analytics Life Cycle with SAS & Teradata. Rick Lower

Optimizing Your Analytics Life Cycle with SAS & Teradata. Rick Lower Optimizing Your Analytics Life Cycle with SAS & Teradata Rick Lower 1 Agenda The Analytic Life Cycle Common Problems SAS & Teradata solutions Analytical Life Cycle Exploration Explore All Your Data Preparation

More information

A Macro for Systematic Treatment of Special Values in Weight of Evidence Variable Transformation Chaoxian Cai, Automated Financial Systems, Exton, PA

A Macro for Systematic Treatment of Special Values in Weight of Evidence Variable Transformation Chaoxian Cai, Automated Financial Systems, Exton, PA Paper RF10-2015 A Macro for Systematic Treatment of Special Values in Weight of Evidence Variable Transformation Chaoxian Cai, Automated Financial Systems, Exton, PA ABSTRACT Weight of evidence (WOE) recoding

More information

Predictive Modeling with SAS Enterprise Miner

Predictive Modeling with SAS Enterprise Miner Predictive Modeling with SAS Enterprise Miner Practical Solutions for Business Applications Third Edition Kattamuri S. Sarma, PhD Solutions to Exercises sas.com/books This set of Solutions to Exercises

More information

Introducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone

Introducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone Introducing Microsoft SQL Server 2016 R Services Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone SQL Server 2016: Everything built-in built-in built-in built-in built-in built-in $2,230

More information

劉介宇 國立台北護理健康大學 護理助產研究所 / 通識教育中心副教授 兼教師發展中心教師評鑑組長 Nov 19, 2012

劉介宇 國立台北護理健康大學 護理助產研究所 / 通識教育中心副教授 兼教師發展中心教師評鑑組長 Nov 19, 2012 劉介宇 國立台北護理健康大學 護理助產研究所 / 通識教育中心副教授 兼教師發展中心教師評鑑組長 Nov 19, 2012 Overview of Data Mining ( 資料採礦 ) What is Data Mining? Steps in Data Mining Overview of Data Mining techniques Points to Remember Data mining

More information

Recitation Supplement: Creating a Neural Network for Classification SAS EM December 2, 2002

Recitation Supplement: Creating a Neural Network for Classification SAS EM December 2, 2002 Recitation Supplement: Creating a Neural Network for Classification SAS EM December 2, 2002 Introduction Neural networks are flexible nonlinear models that can be used for regression and classification

More information

A SAS/AF Application for Parallel Extraction, Transformation, and Scoring of a Very Large Database

A SAS/AF Application for Parallel Extraction, Transformation, and Scoring of a Very Large Database Paper 11 A SAS/AF Application for Parallel Extraction, Transformation, and Scoring of a Very Large Database Daniel W. Kohn, Ph.D., Torrent Systems Inc., Cambridge, MA David L. Kuhn, Ph.D., Innovative Idea

More information

A Genetic Algorithm-Based Approach for Building Accurate Decision Trees

A Genetic Algorithm-Based Approach for Building Accurate Decision Trees A Genetic Algorithm-Based Approach for Building Accurate Decision Trees by Z. Fu, Fannie Mae Bruce Golden, University of Maryland S. Lele,, University of Maryland S. Raghavan,, University of Maryland Edward

More information

Notes based on: Data Mining for Business Intelligence

Notes based on: Data Mining for Business Intelligence Chapter 9 Classification and Regression Trees Roger Bohn April 2017 Notes based on: Data Mining for Business Intelligence 1 Shmueli, Patel & Bruce 2 3 II. Results and Interpretation There are 1183 auction

More information

Data Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.

Data Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA. Data Mining Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA January 13, 2011 Important Note! This presentation was obtained from Dr. Vijay Raghavan

More information

IBM SPSS Categories. Predict outcomes and reveal relationships in categorical data. Highlights. With IBM SPSS Categories you can:

IBM SPSS Categories. Predict outcomes and reveal relationships in categorical data. Highlights. With IBM SPSS Categories you can: IBM Software IBM SPSS Statistics 19 IBM SPSS Categories Predict outcomes and reveal relationships in categorical data Highlights With IBM SPSS Categories you can: Visualize and explore complex categorical

More information

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs 1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be

More information

Introducing SAS Model Manager 15.1 for SAS Viya

Introducing SAS Model Manager 15.1 for SAS Viya ABSTRACT Paper SAS2284-2018 Introducing SAS Model Manager 15.1 for SAS Viya Glenn Clingroth, Robert Chu, Steve Sparano, David Duling SAS Institute Inc. SAS Model Manager has been a popular product since

More information

Fault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold BIWA Summit 2016

Fault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold BIWA Summit 2016 Fault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold BIWA Summit 2016 Mark Hornick, Director, Advanced Analytics January 27, 2016 Safe Harbor Statement The following

More information

data-based banking customer analytics

data-based banking customer analytics icare: A framework for big data-based banking customer analytics Authors: N.Sun, J.G. Morris, J. Xu, X.Zhu, M. Xie Presented By: Hardik Sahi Overview 1. 2. 3. 4. 5. 6. Why Big Data? Traditional versus

More information

Unsupervised Learning: Clustering

Unsupervised Learning: Clustering Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning

More information

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect BEOP.CTO.TP4 Owner: OCTO Revision: 0001 Approved by: JAT Effective: 08/30/2018 Buchanan & Edwards Proprietary: Printed copies of

More information

SELECTING CLASSIFICATION AND CLUSTERING TOOLS FOR ACADEMIC SUPPORT

SELECTING CLASSIFICATION AND CLUSTERING TOOLS FOR ACADEMIC SUPPORT SELECTING CLASSIFICATION AND CLUSTERING TOOLS FOR ACADEMIC SUPPORT Manying Qiu, Virginia State University, mqiu@vsu.edu ABSTRACT Classification and clustering are powerful and popular data mining techniques.

More information

Question Bank. 4) It is the source of information later delivered to data marts.

Question Bank. 4) It is the source of information later delivered to data marts. Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile

More information

Data Preparation for Analytics

Data Preparation for Analytics Data Preparation for Analytics Dr. Gerhard Svolba SAS-Austria Vienna http://sascommunity.org/wiki/gerhard_svolba Agenda Data Preparation for Analytics General Thoughts Data Structures for Analytics Case

More information

Getting Started with Advanced Analytics in Finance, Marketing, and Operations

Getting Started with Advanced Analytics in Finance, Marketing, and Operations Getting Started with Advanced Analytics in Finance, Marketing, and Operations Southwest Regional Oracle Applications User Group Dan Vlamis February 24, 2017 @VlamisSoftware Vlamis Software Solutions Vlamis

More information

Data Mining & Machine Learning F2.4DN1/F2.9DM1

Data Mining & Machine Learning F2.4DN1/F2.9DM1 Data Mining & Machine Learning F2.4DN1/F2.9DM1 Nick Taylor N.K.Taylor@hw.ac.uk Room EM1.62 Data Data Mining - Content Introduction to Data Mining What it is, Who does it and Why Data Warehousing Virtuous

More information

Predictive Modeling with SAS Enterprise Miner

Predictive Modeling with SAS Enterprise Miner Predictive Modeling with SAS Enterprise Miner Practical Solutions for Business Applications Second Edition Kattamuri S. Sarma, PhD From Predictive Modeling with SAS Enterprise Miner TM, From Predictive

More information

SAS 9.3 In-Database Products

SAS 9.3 In-Database Products SAS 9.3 In-Database Products User s Guide Second Edition SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc 2011. SAS 9.3 In-Database Products: User s

More information

Introduction to Data Mining. Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd

Introduction to Data Mining. Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd Introduction to Data Mining Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk Objectives Overview Data Mining Introduce typical applications and scenarios Explain

More information

An Optimal Search Process of Eigen Knots. for Spline Logistic Regression. Research Department, Point Right

An Optimal Search Process of Eigen Knots. for Spline Logistic Regression. Research Department, Point Right An Optimal Search Process of Eigen Knots for Spline Logistic Regression John Gao and Cheryl Caswell Research Department, Point Right Abstract The spline regression method usually defines a series piecewise

More information

Data mining overview. Data Mining. Data mining overview. Data mining overview. Data mining overview. Data mining overview 3/24/2014

Data mining overview. Data Mining. Data mining overview. Data mining overview. Data mining overview. Data mining overview 3/24/2014 Data Mining Data mining processes What technological infrastructure is required? Data mining is a system of searching through large amounts of data for patterns. It is a relatively new concept which is

More information

Tessera Rapid Modeling Environment: Production-Strength Data Mining Solution for Terabyte-Class Relational Data Warehouses

Tessera Rapid Modeling Environment: Production-Strength Data Mining Solution for Terabyte-Class Relational Data Warehouses Tessera Rapid ing Environment: Production-Strength Data Mining Solution for Terabyte-Class Relational Data Warehouses Michael Nichols, John Zhao, John David Campbell Tessera Enterprise Systems RME Purpose

More information

<Insert Picture Here> Managing Oracle Exadata Database Machine with Oracle Enterprise Manager 11g

<Insert Picture Here> Managing Oracle Exadata Database Machine with Oracle Enterprise Manager 11g Managing Oracle Exadata Database Machine with Oracle Enterprise Manager 11g Exadata Overview Oracle Exadata Database Machine Extreme ROI Platform Fast Predictable Performance Monitor

More information

Intro to Artificial Intelligence

Intro to Artificial Intelligence Intro to Artificial Intelligence Ahmed Sallam { Lecture 5: Machine Learning ://. } ://.. 2 Review Probabilistic inference Enumeration Approximate inference 3 Today What is machine learning? Supervised

More information

Chapter 6: Modifying and Combining Data Sets

Chapter 6: Modifying and Combining Data Sets Chapter 6: Modifying and Combining Data Sets The SET statement is a powerful statement in the DATA step. Its main use is to read in a previously created SAS data set which can be modified and saved as

More information

The Procedure Proposal of Manufacturing Systems Management by Using of Gained Knowledge from Production Data

The Procedure Proposal of Manufacturing Systems Management by Using of Gained Knowledge from Production Data The Procedure Proposal of Manufacturing Systems Management by Using of Gained Knowledge from Production Data Pavol Tanuska Member IAENG, Pavel Vazan, Michal Kebisek, Milan Strbo Abstract The paper gives

More information

An Introduction to Data Mining in Institutional Research. Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa

An Introduction to Data Mining in Institutional Research. Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa An Introduction to Data Mining in Institutional Research Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa AIR/SPSS Professional Development Series Background Covering variety

More information

Paper DB2 table. For a simple read of a table, SQL and DATA step operate with similar efficiency.

Paper DB2 table. For a simple read of a table, SQL and DATA step operate with similar efficiency. Paper 76-28 Comparative Efficiency of SQL and Base Code When Reading from Database Tables and Existing Data Sets Steven Feder, Federal Reserve Board, Washington, D.C. ABSTRACT In this paper we compare

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining José Hernández-Orallo Dpto. de Sistemas Informáticos y Computación Universidad Politécnica de Valencia, Spain jorallo@dsic.upv.es Roma, 14-15th May 2009 1 Outline Motivation.

More information

Data Analytics Training Program

Data Analytics Training Program Data Analytics Training Program In exclusive association with 1200+ Trainings 20,000+ Participants 10,000+ Brands 45+ Countries [Since 2009] Training partner for Who Is This Course For? Programers Willing

More information

In-Database Procedures with Teradata: How They Work and What They Buy You David Shamlin and David Duling, SAS Institute, Cary, NC

In-Database Procedures with Teradata: How They Work and What They Buy You David Shamlin and David Duling, SAS Institute, Cary, NC Paper 337-2009 In-Database Procedures with Teradata: How They Work and What They Buy You David Shamlin and David Duling, SAS Institute, Cary, NC ABSTRACT SAS applications are often built to work with large

More information

Scoring with Analytic Stores

Scoring with Analytic Stores Scoring with Analytic Stores Merve Yasemin Tekbudak, SAS Institute Inc., Cary, NC In supervised learning, scoring is the process of applying a previously built predictive model to a new data set in order

More information

DATACENTER AS A SERVICE. We unburden you at the level you desire

DATACENTER AS A SERVICE. We unburden you at the level you desire DATACENTER AS A SERVICE We unburden you at the level you desire MARKET TREND BY VARIOUS ANALYSTS The concept of flexible and scalable computing is a key reason to create a Cloud based architecture 77%

More information

ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA

ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA INSIGHTS@SAS: ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA AGENDA 09.00 09.15 Intro 09.15 10.30 Analytics using SAS Enterprise Guide Ellen Lokollo 10.45 12.00 Advanced Analytics using SAS

More information

SPSS Statistics 19.0 Fix Pack 2 Fix List Release notes Abstract Content Number Description

SPSS Statistics 19.0 Fix Pack 2 Fix List Release notes Abstract Content Number Description SPSS Statistics 19.0 Fix Pack 2 Fix List Release notes Abstract A comprehensive list of defect corrections for the SPSS Statistics 19.0 Fix Pack 2. Details of the fixes are listed below. If you have questions

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

BUSINESS ANALYTICS. 96 HOURS Practical Learning. DexLab Certified. Training Module. Gurgaon (Head Office)

BUSINESS ANALYTICS. 96 HOURS Practical Learning. DexLab Certified. Training Module. Gurgaon (Head Office) SAS (Base & Advanced) Analytics & Predictive Modeling Tableau BI 96 HOURS Practical Learning WEEKDAY & WEEKEND BATCHES CLASSROOM & LIVE ONLINE DexLab Certified BUSINESS ANALYTICS Training Module Gurgaon

More information

Scoring Models, Probability Transformations & Model Calibration Using SAS

Scoring Models, Probability Transformations & Model Calibration Using SAS Scoring Models, Probability Transformations & Model Calibration Using SAS Ilan Benamara Krzysztof Dzieciolowski Rogers Communications October 16, 2013 SAS Data Mining Forum, Toronto Questions How can we

More information

PROJECT 1 DATA ANALYSIS (KR-VS-KP)

PROJECT 1 DATA ANALYSIS (KR-VS-KP) PROJECT 1 DATA ANALYSIS (KR-VS-KP) Author: Tomáš Píhrt (xpiht00@vse.cz) Date: 12. 12. 2015 Contents 1 Introduction... 1 1.1 Data description... 1 1.2 Attributes... 2 1.3 Data pre-processing & preparation...

More information

Comparative analysis of data mining methods for predicting credit default probabilities in a retail bank portfolio

Comparative analysis of data mining methods for predicting credit default probabilities in a retail bank portfolio Comparative analysis of data mining methods for predicting credit default probabilities in a retail bank portfolio Adela Ioana Tudor, Adela Bâra, Simona Vasilica Oprea Department of Economic Informatics

More information

Data Mining with SPSS Modeler

Data Mining with SPSS Modeler Tilo Wendler Soren Grottrup Data Mining with SPSS Modeler Theory, Exercises and Solutions Springer 1 Introduction 1 1.1 The Concept of the SPSS Modeler 2 1.2 Structure and Features of This Book 5 1.2.1

More information

Oracle Machine Learning Notebook

Oracle Machine Learning Notebook Oracle Machine Learning Notebook Included in Autonomous Data Warehouse Cloud Charlie Berger, MS Engineering, MBA Sr. Director Product Management, Machine Learning, AI and Cognitive Analytics charlie.berger@oracle.com

More information

Data Mining. Lab Exercises

Data Mining. Lab Exercises Data Mining Lab Exercises Predictive Modelling Purpose The purpose of this study is to learn how data mining methods / tools (SAS System/ SAS Enterprise Miner) can be used to solve predictive modeling

More information

Intermediate SAS: Statistics

Intermediate SAS: Statistics Intermediate SAS: Statistics OIT TSS 293-4444 oithelp@mail.wvu.edu oit.wvu.edu/training/classmat/sas/ Table of Contents Procedures... 2 Two-sample t-test:... 2 Paired differences t-test:... 2 Chi Square

More information

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio

More information

Classification and Regression

Classification and Regression Classification and Regression Announcements Study guide for exam is on the LMS Sample exam will be posted by Monday Reminder that phase 3 oral presentations are being held next week during workshops Plan

More information

CSE4334/5334 DATA MINING

CSE4334/5334 DATA MINING CSE4334/5334 DATA MINING Lecture 4: Classification (1) CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides courtesy

More information

Preserving your SAS Environment in a Non-Persistent World. A Detailed Guide to PROC PRESENV. Steven Gross, Wells Fargo, Irving, TX

Preserving your SAS Environment in a Non-Persistent World. A Detailed Guide to PROC PRESENV. Steven Gross, Wells Fargo, Irving, TX Preserving your SAS Environment in a Non-Persistent World A Detailed Guide to PROC PRESENV Steven Gross, Wells Fargo, Irving, TX ABSTRACT For Enterprise Guide users, one of the challenges often faced is

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING Clustering Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu November 7, 2017 Learnt Clustering Methods Vector Data Set Data Sequence Data Text

More information

Big Data Analytics The Data Mining process. Roger Bohn March. 2016

Big Data Analytics The Data Mining process. Roger Bohn March. 2016 1 Big Data Analytics The Data Mining process Roger Bohn March. 2016 Office hours HK thursday5 to 6 in the library 3115 If trouble, email or Slack private message. RB Wed. 2 to 3:30 in my office Some material

More information

What does SAS Enterprise Miner do? For whom is SAS Enterprise Miner designed?

What does SAS Enterprise Miner do? For whom is SAS Enterprise Miner designed? FACT SHEET SAS Enterprise Miner Create highly accurate analytical models that enable you to predict with confidence What does SAS Enterprise Miner do? It streamlines the data mining process so you can

More information

Debugging. Where to start? John Ladds, SAS Technology Center, Statistics Canada.

Debugging. Where to start? John Ladds, SAS Technology Center, Statistics Canada. Debugging Where to start? John Ladds, SAS Technology Center, Statistics Canada Come out of the desert of ignorance to the OASUS of knowledge Did it work? I don t see any red. So it must have worked, right?

More information

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Abstract Deciding on which algorithm to use, in terms of which is the most effective and accurate

More information

SAS Visual Analytics 8.1: Getting Started with Analytical Models

SAS Visual Analytics 8.1: Getting Started with Analytical Models SAS Visual Analytics 8.1: Getting Started with Analytical Models Using This Book Audience This book covers the basics of building, comparing, and exploring analytical models in SAS Visual Analytics. The

More information

A Side of Hash for You To Dig Into

A Side of Hash for You To Dig Into A Side of Hash for You To Dig Into Shan Ali Rasul, Indigo Books & Music Inc, Toronto, Ontario, Canada. ABSTRACT Within the realm of Customer Relationship Management (CRM) there is always a need for segmenting

More information