Data Preparation for Analytics

Size: px
Start display at page:

Download "Data Preparation for Analytics"

Transcription

1 Data Preparation for Analytics Dr. Gerhard Svolba SAS-Austria Vienna

2 Agenda Data Preparation for Analytics General Thoughts Data Structures for Analytics Case Study: Building an efficient one-row-persubject data mart Data Management with the SAS Language and with SAS Tools Closing Thoughts

3 Some Words on Data Preparation Is for techies Is just coding Is boring Consumes up to 80 % of the project Was not in the focus of data mining literature so far Is something that SAS can excellently do Is vital to the quality of the project

4 The Analysis Process: From Raw Data to Actionable Results Data Sources Data Preparation Analytic Modeling Results and Actions Different Data Sources Relational Models, Star Schemes Merges, Denormalisation Derived Variables Transpositions, Aggregations Modeling, Parameter Estimation, Tuning, Predictions, Classifications, Clustering Usage of Results Profiling Interpretations Data Availability + Adequate Preparation Clever + Modeling = Good Results

5 Four Dimensions for Analytic Data Preparation Business and Process Knowledge Analytical Knowledge Analytic Data Preparaton Efficient and Clever SAS Coding Documentation and Maintainance

6 Four Dimensions for Analytic Data Preparation Business and Process Knowledge Analytical Knowledge Analytic Data Preparaton Efficient and Clever SAS Coding Documentation and Maintainance

7 Challenging a Business Question Can you calculate the probability that a customer will cancel the usage of product X? Questions: Cancellation or decline in usage? How do we treat usage of more advanced products? Include non-volontary cancellation? How quickly can retention measures take place? Do you want to have probabilites per customer or a list of the high risk customers or risk classes in general?

8 Business, Statistican and IT: The Optimal Triangle!? I have formulated my business question. Are there any reasons, not to be back with results in 2 days? Analytic Data Preparaton I would like to have an analytic database with a high number of attributes and an environment to perform both, data management and analysis. Simon Retention Manager Name me the list of attributes and derived variables that you will use in your final model and which I have to deliver periodically. Daniele Quantitative Expert Elias IT Expert

9 Four Dimensions for Analytic Data Preparation Business and Process Knowledge Analytical Knowledge Analytic Data Preparaton Efficient and Clever SAS Coding Documentation and Maintainance

10 Business Questions Analytical Models Event prediction (Churn, Fraud, Delinquency, Response, ) Value prediction (Purchase Size, Claim Amount, ) Clustering (Segmentation, ) Market Basket Analysis (Association Analysis, ) Time Series Forecasting

11 Analysis Subjects and Multiple Observations Analysis subjects are entities that are being analyzed and the analysis results are interpreted in their context. Multiple observations per analysis subject Repeated measurements over time Multiple observations because of hierarchical relationships

12 Main Types of Data Marts One-Row-per- Subject Data Mart Multiple-Row-per- Subject Data Mart Longitudinal Data Mart

13 Data Mart Structures Data Mart Structure for the Analysis Structure of the source data: Multiple observations per analysis subject exist? NO YES One-Row-per-Subject Data Mart Data Mart with one row per subject is created. Information of multiple observations has to be aggregated or transposed per analysis subject Multiple-Row-per-Subject Data Mart (Key-Value Table) Data Mart with one row per multiple observations is created. Variables at the analysis subject level are duplicated for each repetition

14 The One-Row-Per-Subject Data Mart Required by many statistical methods Regression Analysis, Neural Networks, Decision Trees, Survival analysis, Cluster analysis, Most prominent data mart structure in data mining Event prediction (Churn, Fraud, Delinquency, Response, ) Value prediction (Purchase Size, Claim Amount, ) Segmentation (Clustering, )

15 The One-Row-Per-Subject Paradigm

16 Transposing Data to One-Row-Per-Subject PROC TRANSPOSE DATA = long OUT = wide(drop = _name_) PREFIX = weight; BY id ; VAR weight; ID time; RUN;

17 Clever Aggregations Interval Data Static Aggregation Correlation of Values Course over Time Concentration of Values Categorical Data Frequency Counts Concatenated Frequencies Total and Distinct Counts

18 Correlation of Values How do the monthly values per subject correlate with the overall mean per month?

19 Measures for the Course over Time PROC REG DATA = longitud NOPRINT OUTEST=Est_LongTerm(KEEP = CustID month RENAME = (month=longterm)); MODEL usage = month; BY CustID; RUN; PROC REG DATA = longitud NOPRINT OUTEST=Est_ShortTerm(KEEP = CustID month RENAME = (month=shortterm)); MODEL usage = month; BY CustID; WHERE month in (5 6); RUN;

20 Concentration of Values Concentration = proportion of the sum of the top 50 % sub-hierarchies / the total sum over all sub hierarchies

21 Categorical Variables: Frequency Counts Source Data Absolute and Relative Frequencies Counts and Distinct Counts

22 Categorical Variables: Concatenated Frequencies Account_ Cumulative Cumulative RowPct Frequency Percent Frequency Percent _100_0_ _0_0_ _0_0_ _0_0_ _0_100_ _0_0_ _0_50_ _0_50_

23 Hierarchies: Aggregating Up, Copying Down

24 Main Types of Data Marts One-Row-per- Subject Data Mart Multiple-Row-per- Subject Data Mart Longitudinal Data Mart

25 The standard form for longitudinal data The inter-leaved longitudinal data mart

26 The cross-sectional dimension data mart

27 Longitudinal Data Marts Aggregating data at the right level Transactional Data Finest Granularity Most Appropriate Aggregation Level Defining cross sectional groups Alignment of time values Creation of event indicators and input variables

28 Four Dimensions for Analytic Data Preparation Business and Process Knowledge Analytical Knowledge Analytic Data Preparaton Efficient and Clever SAS Coding Documentation and Maintainance

29 Case Study: Business Question Predict customers that have a high probability to leave the company Derive target variable Cancellation YES/NO from the monthly value segment history (Entry 8. LOST ) Create a one-row-per-subject data mart for data mining analysis

30 Case Study: Data and Data Model Customer data: demographic and customer baseline data Account data: information customer accounts Leasing data: data on leasing information Call Center data: data on Call center contacts Score data: data of value segment scores

31 Considerations for Predictive Modeling Only consider customers that are still active at this point in time Snapshot Date: June 30th, 2003 Do not use input data after this point in time Define Cancellation YES/NO based on values in the target window Observation Window Offset Target Window April 2003 May 2003 June 2003 July 2003 August 2003

32 Using Data from the CALLCENTER Table %let snapdate = 30JUN2003 d; PROC FREQ DATA = callcenter NOPRINT; TABLE CustID / OUT = CallCenterComplaints (DROP = Percent RENAME = (Count = Complaints)); WHERE Category = 'Complaint' and datepart(date) <= &snapdate; RUN;

33 Using Data from the SCORES-Table %let snapdate = 30JUN2003 d; DATA ScoreFuture(RENAME = (ValueSegment = FutureValueSegment)) ScoreActual ScoreLastMonth(RENAME = (ValueSegment = LastValueSegment)); SET Scores; DATE = INTNX('MONTH',Date,0,'END'); DROP Date; IF Date = &snapdate THEN OUTPUT ScoreActual; ELSE IF Date = INTNX('MONTH',&snapdate,-1) THEN OUTPUT ScoreLastMonth; ELSE IF Date = INTNX('MONTH',&snapdate,2) THEN OUTPUT ScoreFuture; RUN; Observation Window Offset Target Window April 2003 May 2003 June 2003 July 2003 August 2003

34 DATA CustomerMart; ATTRIB /* Customer Baseline */ CustID FORMAT = 8. LABEL = "Customer ID" Birthdate FORMAT = DATE9. LABEL = "Date of Birth" Alter FORMAT = 8. LABEL = "Age (years)" Gender FORMAT = $6. LABEL = "Gender" MaritalStatus FORMAT = $10. LABEL = "Marital Status" Title FORMAT = $10. LABEL = "Academic Title" HasTitle FORMAT = 8. LABEL = "Has Title? 0/1" Branch FORMAT = $5. LABEL = "Branch Name"; MERGE Customer (IN = InCustomer) AccountSum (IN = InAccounts) AccountTypes LeasingSum (IN = InLeasing) CallCenterContacts (IN = InCallCenter) CallCenterComplaints ScoreFuture ScoreActual ScoreLastMonth; BY CustID; IF InCustomer;

35 /* Customer Baseline */ HasTitle = (Title ne ""); Alter = (&Snapdate-Birthdate)/365.25; CustomerMonths = (&Snapdate- CustomerSince)/(365.25/12); /* Accounts */ HasAccounts = InAccounts; LoanPct = Loan / BalanceSum * 100; SavingAccountPct = SavingAccount / BalanceSum * 100; FundsPct = Funds / BalanceSum * 100; /* Leasing */ HasLeasing = InLeasing; /* Call Center */ HasCallCenter = InCallCenter; ComplaintPct = Complaints / Calls *100; /* Value Segment */ Cancel = (FutureValueSegment = '8. LOST'); ChangeValueSegment = (ValueSegment = LastValueSegment); RUN;

36 Screenshots of the Resulting Data Mart

37 The Power of the SAS Language for Data Preparation

38 SAS Data Step vs. SQL Transposing a table PROC TRANSPOSE DATA = sampsio.assocs(obs=21) OUT = assoc_tp (DROP = _name_); BY customer; ID Product; RUN;

39 Changing between longitdutinal data structures Standard Form Interleaved

40 Selection of the first and last line per customer CustID Month Value FirstValue LastValue

41 Creating a sequential number and summing items CustID Date Points PurchNo Cum_Poi

42 Copyingomitteddata m w m 26 1 Value Month Gender Age CustID 32 9 m m m w w w m m m 26 1 Value Month Gender Age CustID

43 Shift down a column for k positions Date Value Value_ PrevDay

44 Analytic Data Preparation in SAS Enterprise Miner

45 Analytic Data Preparation in SAS Enterprise Miner Input Data Source Node Sample Node Data Partition Node Metadata Node Filter Node Transform Variables Node Impute Node SAS Code Node Principal Components Node

46 Working with Multiple-Row-per-Subject Data in SAS Enterprise Miner Time Series Node Association Node Merge Node

47 Extending SAS Enterprise Miner with Data Preparation for Analytics Nodes Extension Nodes Correlation TrendRegression Concentration CategoryCount

48 Analytic Data Preparation in SAS Enterprise Miner - Summary Documentation of the data preparation process as a whole in the process flow diagram Definition of variables metadata that are used in the analysis Automatic Creation of Dummy-Variables for categorical data Powerful data transformations in the filter node, transform variables node and impute node. Possibility to perform association analysis and time-series-analysis Creation of score code from all nodes in SAS Enterprise Miner as SAS datastep code

49 Analytic Data Preparation in SAS Forecast Studio

50 Analytic Data Preparation in SAS Forecast Studio Convert transactional data into time series data Treatment of missing values Alignment of date values in intervals Aggregation of data on various levels Handling of events Allows users to create event definitions, assign events to selected series in the project and delete events Users can specify event duration, shape, and recurrence options. Pre-defined common events and holidays are available for inclusion in the forecasting models

51 Four Dimensions for Analytic Data Preparation Business and Process Knowledge Analytical Knowledge Analytic Data Preparaton Efficient and Clever SAS Coding Documentation and Maintainance

52 SAS Data Integration Studio

53 SAS Data Integration Studio

54 SAS Data Integration Studio General Features Comprehensive transformation library Graphical user interface, drag & drop, wizard driven Multi-developer support: Check-in/check-out, change management Impact analysis Documentation of the data management process Data Mining Model Scoring Register Model in SAS metadata repository Apply SAS Enterprise Miner model to new data source Creates the target table definition in the metadata

55 SAS DI Studio Analytic Features Data Mining Model Scoring Register Model in SAS metadata repository Apply SAS Enterprise Miner model to new data source Creates the target table definition in the metadata Forecast Analysis Transformation Allows to perform time series analysis Based on HPF (high performance forecasting) procedure Allows to integrate the forecast step into the data flow process in the metadata

56 Summary Analytic Data Preparation is a discipline, not a incommodious necessity Analytic Data Preparation is more than just coding The One-Row-Per-Subject Paradigm Central in data mining and predictive modeling Do not stop with simple transpose, summing or averaging Clever aggregations can be the key success factor Predictive Modeling and Historic Data: Which data are you allowed to use for modeling? SAS combines powerful data management functionality and market leading analytics in one integrated package SAS Tools like SAS DI-Studio, SAS Enterprise Miner and SAS Forecast Studio assist you in analytic data preparation

57 The commercial break

58 "It is exciting to see a book completely devoted to data preparation in SAS." Jin Li Statistician Capital One Financial Services This is a must read for anyone, who prepares data for data mining and analytics. Not only you receive ideas and suggestions for important derived variables for your datamart, you also get a lot of insight on the business rationale behind the scenes. The author does a great job in explaining you step by step the world of data preparation for data mining and shows a lot of example code and macros. Christine Hallwirth MHE & Partners Gerhard Svolba's book "Data Preparation for Analytics" is an "owner's manual" for data miners, business analysts and all who prefer to be in charge of and responsible for their own datasets. This book is for those who are not afraid of data, who understand data, and for whom rolling up their sleeves and getting their hands into data is an integral part of analytics and predictive modeling. Lessia Shajenko (American Bank)

59 Recommended Reading Data Preparation for Analytics by Gerhard Svolba SAS-Press (#60502) Business Rationale Concepts Coding Examples

60 Downloads Data Preparation for Analytics

61 Questions and Contact Dr. Gerhard Svolba

The Consequences of Poor Data Quality on Model Accuracy

The Consequences of Poor Data Quality on Model Accuracy The Consequences of Poor Data Quality on Model Accuracy Dr. Gerhard Svolba SAS Austria Cologne, June 14th, 2012 From this talk you can expect The analytical viewpoint on data quality Answers to the questions

More information

SAS Enterprise Miner Extension Nodes by Gerhard Svolba

SAS Enterprise Miner Extension Nodes by Gerhard Svolba SAS Enterprise Miner Extension Nodes by Gerhard Svolba Data Preparation and other useful tools (GTOOLS) Dr. Gerhard Svolba SAS-Austria Email: sastools.by.gerhard@gmx.at Data Preparation for Analytics http://www.sascommunity.org/wiki/data_preparation_for_analytics

More information

Data Acquisition and Integration

Data Acquisition and Integration CHAPTER Data Acquisition and Integration 2 2.1 INTRODUCTION This chapter first provides a brief review of data sources and types of variables from the point of view of data mining. Then it presents the

More information

Now, Data Mining Is Within Your Reach

Now, Data Mining Is Within Your Reach Clementine Desktop Specifications Now, Data Mining Is Within Your Reach Data mining delivers significant, measurable value. By uncovering previously unknown patterns and connections in data, data mining

More information

SAS Enterprise Miner : Tutorials and Examples

SAS Enterprise Miner : Tutorials and Examples SAS Enterprise Miner : Tutorials and Examples SAS Documentation February 13, 2018 The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2017. SAS Enterprise Miner : Tutorials

More information

Enterprise Miner Tutorial Notes 2 1

Enterprise Miner Tutorial Notes 2 1 Enterprise Miner Tutorial Notes 2 1 ECT7110 E-Commerce Data Mining Techniques Tutorial 2 How to Join Table in Enterprise Miner e.g. we need to join the following two tables: Join1 Join 2 ID Name Gender

More information

ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA

ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA INSIGHTS@SAS: ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA AGENDA 09.00 09.15 Intro 09.15 10.30 Analytics using SAS Enterprise Guide Ellen Lokollo 10.45 12.00 Advanced Analytics using SAS

More information

Tessera Rapid Modeling Environment: Production-Strength Data Mining Solution for Terabyte-Class Relational Data Warehouses

Tessera Rapid Modeling Environment: Production-Strength Data Mining Solution for Terabyte-Class Relational Data Warehouses Tessera Rapid ing Environment: Production-Strength Data Mining Solution for Terabyte-Class Relational Data Warehouses Michael Nichols, John Zhao, John David Campbell Tessera Enterprise Systems RME Purpose

More information

DATA MINING AND WAREHOUSING

DATA MINING AND WAREHOUSING DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making

More information

Data Mining Overview. CHAPTER 1 Introduction to SAS Enterprise Miner Software

Data Mining Overview. CHAPTER 1 Introduction to SAS Enterprise Miner Software 1 CHAPTER 1 Introduction to SAS Enterprise Miner Software Data Mining Overview 1 Layout of the SAS Enterprise Miner Window 2 Using the Application Main Menus 3 Using the Toolbox 8 Using the Pop-Up Menus

More information

Data Science. Data Analyst. Data Scientist. Data Architect

Data Science. Data Analyst. Data Scientist. Data Architect Data Science Data Analyst Data Analysis in Excel Programming in R Introduction to Python/SQL/Tableau Data Visualization in R / Tableau Exploratory Data Analysis Data Scientist Inferential Statistics &

More information

SAS Factory Miner 14.2: User s Guide

SAS Factory Miner 14.2: User s Guide SAS Factory Miner 14.2: User s Guide SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2016. SAS Factory Miner 14.2: User s Guide. Cary, NC: SAS Institute

More information

Data Analysis and Data Science

Data Analysis and Data Science Data Analysis and Data Science CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/29/15 Agenda Check-in Online Analytical Processing Data Science Homework 8 Check-in Online Analytical

More information

Learning Objectives for Data Concept and Visualization

Learning Objectives for Data Concept and Visualization Learning Objectives for Data Concept and Visualization Assignment 1: Data Quality Concept and Impact of Data Quality Summarize concepts of data quality. Understand and describe the impact of data on actuarial

More information

Data Warehouses Chapter 12. Class 10: Data Warehouses 1

Data Warehouses Chapter 12. Class 10: Data Warehouses 1 Data Warehouses Chapter 12 Class 10: Data Warehouses 1 OLTP vs OLAP Operational Database: a database designed to support the day today transactions of an organization Data Warehouse: historical data is

More information

Guide Users along Information Pathways and Surf through the Data

Guide Users along Information Pathways and Surf through the Data Guide Users along Information Pathways and Surf through the Data Stephen Overton, Overton Technologies, LLC, Raleigh, NC ABSTRACT Business information can be consumed many ways using the SAS Enterprise

More information

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect BEOP.CTO.TP4 Owner: OCTO Revision: 0001 Approved by: JAT Effective: 08/30/2018 Buchanan & Edwards Proprietary: Printed copies of

More information

OLAP Introduction and Overview

OLAP Introduction and Overview 1 CHAPTER 1 OLAP Introduction and Overview What Is OLAP? 1 Data Storage and Access 1 Benefits of OLAP 2 What Is a Cube? 2 Understanding the Cube Structure 3 What Is SAS OLAP Server? 3 About Cube Metadata

More information

Taking Your Application Design to the Next Level with Data Mining

Taking Your Application Design to the Next Level with Data Mining Taking Your Application Design to the Next Level with Data Mining Peter Myers Mentor SolidQ Australia HDNUG 24 June, 2008 WHO WE ARE Industry experts: Growing, elite group of over 90 of the world s best

More information

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS Topics AGENDA Challenges with Big Data Analytics How SAS can help you to minimize time to value with

More information

ENTERPRISE MINER: 1 DATA EXPLORATION AND VISUALISATION

ENTERPRISE MINER: 1 DATA EXPLORATION AND VISUALISATION ENTERPRISE MINER: 1 DATA EXPLORATION AND VISUALISATION JOZEF MOFFAT, ANALYTICS & INNOVATION PRACTICE, SAS UK 10, MAY 2016 DATA EXPLORATION AND VISUALISATION AGENDA SAS Webinar 10th May 2016 at 10:00 AM

More information

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda Agenda Oracle9i Warehouse Review Dulcian, Inc. Oracle9i Server OLAP Server Analytical SQL Mining ETL Infrastructure 9i Warehouse Builder Oracle 9i Server Overview E-Business Intelligence Platform 9i Server:

More information

SESUG 2014 IT-82 SAS-Enterprise Guide for Institutional Research and Other Data Scientists Claudia W. McCann, East Carolina University.

SESUG 2014 IT-82 SAS-Enterprise Guide for Institutional Research and Other Data Scientists Claudia W. McCann, East Carolina University. Abstract Data requests can range from on-the-fly, need it yesterday, to extended projects taking several weeks or months to complete. Often institutional researchers and other data scientists are juggling

More information

SAS (Statistical Analysis Software/System)

SAS (Statistical Analysis Software/System) SAS (Statistical Analysis Software/System) SAS Adv. Analytics or Predictive Modelling:- Class Room: Training Fee & Duration : 30K & 3 Months Online Training Fee & Duration : 33K & 3 Months Learning SAS:

More information

Note: In the presentation I should have said "baby registry" instead of "bridal registry," see

Note: In the presentation I should have said baby registry instead of bridal registry, see Q-and-A from the Data-Mining Webinar Note: In the presentation I should have said "baby registry" instead of "bridal registry," see http://www.target.com/babyregistryportalview Q: You mentioned the 'Big

More information

The DMSPLIT Procedure

The DMSPLIT Procedure The DMSPLIT Procedure The DMSPLIT Procedure Overview Procedure Syntax PROC DMSPLIT Statement FREQ Statement TARGET Statement VARIABLE Statement WEIGHT Statement Details Examples Example 1: Creating a Decision

More information

SAS Enterprise Miner 7.1

SAS Enterprise Miner 7.1 SAS Enterprise Miner 7.1 Data Mining using SAS IASRI Satyajit Dwivedi Transforming the World DATA MINING SEMMA Process Sample Explore Modify Model Assess Utility 2 SEMMA Process - Creating Library Select

More information

Overview. Data Mining for Business Intelligence. Shmueli, Patel & Bruce

Overview. Data Mining for Business Intelligence. Shmueli, Patel & Bruce Overview Data Mining for Business Intelligence Shmueli, Patel & Bruce Galit Shmueli and Peter Bruce 2010 Core Ideas in Data Mining Classification Prediction Association Rules Data Reduction Data Exploration

More information

INTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN...

INTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN... INTRODUCTION... 2 WHAT IS DATA MINING?... 2 HOW TO ACHIEVE DATA MINING... 2 THE ROLE OF DARWIN... 3 FEATURES OF DARWIN... 4 USER FRIENDLY... 4 SCALABILITY... 6 VISUALIZATION... 8 FUNCTIONALITY... 10 Data

More information

2997 Yarmouth Greenway Drive, Madison, WI Phone: (608) Web:

2997 Yarmouth Greenway Drive, Madison, WI Phone: (608) Web: Getting the Most Out of SAS Enterprise Guide 2997 Yarmouth Greenway Drive, Madison, WI 53711 Phone: (608) 278-9964 Web: www.sys-seminar.com 1 Questions, Comments Technical Difficulties: Call 1-800-263-6317

More information

Predictive Modeling with SAS Enterprise Miner

Predictive Modeling with SAS Enterprise Miner Predictive Modeling with SAS Enterprise Miner Practical Solutions for Business Applications Third Edition Kattamuri S. Sarma, PhD Solutions to Exercises sas.com/books This set of Solutions to Exercises

More information

Tasks Menu Reference. Introduction. Data Management APPENDIX 1

Tasks Menu Reference. Introduction. Data Management APPENDIX 1 229 APPENDIX 1 Tasks Menu Reference Introduction 229 Data Management 229 Report Writing 231 High Resolution Graphics 232 Low Resolution Graphics 233 Data Analysis 233 Planning Tools 235 EIS 236 Remote

More information

TDWI strives to provide course books that are content-rich and that serve as useful reference documents after a class has ended.

TDWI strives to provide course books that are content-rich and that serve as useful reference documents after a class has ended. Previews of TDWI course books are provided as an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews can not be printed. TDWI strives

More information

Introduction to Data Mining and Data Analytics

Introduction to Data Mining and Data Analytics 1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns

More information

Choosing the Right Procedure

Choosing the Right Procedure 3 CHAPTER 1 Choosing the Right Procedure Functional Categories of Base SAS Procedures 3 Report Writing 3 Statistics 3 Utilities 4 Report-Writing Procedures 4 Statistical Procedures 5 Efficiency Issues

More information

Slice Intelligence!

Slice Intelligence! Intern @ Slice Intelligence! Wei1an(Wu( September(8,(2014( Outline!! Details about the job!! Skills required and learned!! My thoughts regarding the internship! About the company!! Slice, which we call

More information

Now That You Have Your Data in Hadoop, How Are You Staging Your Analytical Base Tables?

Now That You Have Your Data in Hadoop, How Are You Staging Your Analytical Base Tables? Paper SAS 1866-2015 Now That You Have Your Data in Hadoop, How Are You Staging Your Analytical Base Tables? Steven Sober, SAS Institute Inc. ABSTRACT Well, Hadoop community, now that you have your data

More information

Oracle9i Data Mining. Data Sheet August 2002

Oracle9i Data Mining. Data Sheet August 2002 Oracle9i Data Mining Data Sheet August 2002 Oracle9i Data Mining enables companies to build integrated business intelligence applications. Using data mining functionality embedded in the Oracle9i Database,

More information

Lasso Your Business Users by Designing Information Pathways to Optimize Standardized Reporting in SAS Visual Analytics

Lasso Your Business Users by Designing Information Pathways to Optimize Standardized Reporting in SAS Visual Analytics Paper 2960-2015 Lasso Your Business Users by Designing Information Pathways to Optimize Standardized Reporting in SAS Visual Analytics ABSTRACT Stephen Overton, Zencos Consulting SAS Visual Analytics opens

More information

SAS IT Resource Management Forecasting. Setup Specification Document. A SAS White Paper

SAS IT Resource Management Forecasting. Setup Specification Document. A SAS White Paper SAS IT Resource Management Forecasting Setup Specification Document A SAS White Paper Table of Contents Introduction to SAS IT Resource Management Forecasting... 1 Getting Started with the SAS Enterprise

More information

SAS Data Integration Studio 3.3. User s Guide

SAS Data Integration Studio 3.3. User s Guide SAS Data Integration Studio 3.3 User s Guide The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2006. SAS Data Integration Studio 3.3: User s Guide. Cary, NC: SAS Institute

More information

Analytics and Visualization

Analytics and Visualization GU I DE NO. 4 Analytics and Visualization AWS IoT Analytics Mini-User Guide Introduction As IoT applications scale, so does the data generated from these various IoT devices. This data is raw, unstructured,

More information

Oracle Big Data Discovery

Oracle Big Data Discovery Oracle Big Data Discovery Turning Data into Business Value Harald Erb Oracle Business Analytics & Big Data 1 Safe Harbor Statement The following is intended to outline our general product direction. It

More information

SAS Publishing SAS. Forecast Studio 1.4. User s Guide

SAS Publishing SAS. Forecast Studio 1.4. User s Guide SAS Publishing SAS User s Guide Forecast Studio 1.4 The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2006. SAS Forecast Studio 1.4: User s Guide. Cary, NC: SAS Institute

More information

DB2 SQL Class Outline

DB2 SQL Class Outline DB2 SQL Class Outline The Basics of SQL Introduction Finding Your Current Schema Setting Your Default SCHEMA SELECT * (All Columns) in a Table SELECT Specific Columns in a Table Commas in the Front or

More information

Using PROC SQL to Generate Shift Tables More Efficiently

Using PROC SQL to Generate Shift Tables More Efficiently ABSTRACT SESUG Paper 218-2018 Using PROC SQL to Generate Shift Tables More Efficiently Jenna Cody, IQVIA Shift tables display the change in the frequency of subjects across specified categories from baseline

More information

Types of Data Mining

Types of Data Mining Data Mining and The Use of SAS to Deploy Scoring Rules South Central SAS Users Group Conference Neil Fleming, Ph.D., ASQ CQE November 7-9, 2004 2W Systems Co., Inc. Neil.Fleming@2WSystems.com 972 733-0588

More information

SAS Infrastructure for Risk Management 3.4: User s Guide

SAS Infrastructure for Risk Management 3.4: User s Guide SAS Infrastructure for Risk Management 3.4: User s Guide SAS Documentation March 2, 2018 The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2017. SAS Infrastructure for

More information

KNOWLEDGE DISCOVERY AND DATA MINING

KNOWLEDGE DISCOVERY AND DATA MINING KNOWLEDGE DISCOVERY AND DATA MINING Prof. Fabio A. Schreiber Dipartimento di Elettronica e Informazione Politecnico di Milano INFORMATION MANAGEMENT TECHNOLOGIES DATA WAREHOUSE DECISION SUPPORT SYSTEMS

More information

SAS (Statistical Analysis Software/System)

SAS (Statistical Analysis Software/System) SAS (Statistical Analysis Software/System) SAS Analytics:- Class Room: Training Fee & Duration : 23K & 3 Months Online: Training Fee & Duration : 25K & 3 Months Learning SAS: Getting Started with SAS Basic

More information

Introducing SAS Model Manager 15.1 for SAS Viya

Introducing SAS Model Manager 15.1 for SAS Viya ABSTRACT Paper SAS2284-2018 Introducing SAS Model Manager 15.1 for SAS Viya Glenn Clingroth, Robert Chu, Steve Sparano, David Duling SAS Institute Inc. SAS Model Manager has been a popular product since

More information

DATA WAREHOUSE- MODEL QUESTIONS

DATA WAREHOUSE- MODEL QUESTIONS DATA WAREHOUSE- MODEL QUESTIONS 1. The generic two-level data warehouse architecture includes which of the following? a. At least one data mart b. Data that can extracted from numerous internal and external

More information

SAS (Statistical Analysis Software/System)

SAS (Statistical Analysis Software/System) SAS (Statistical Analysis Software/System) Clinical SAS:- Class Room: Training Fee & Duration : 23K & 3 Months Online: Training Fee & Duration : 25K & 3 Months Learning SAS: Getting Started with SAS Basic

More information

Define the data source for the Organic Purchase Analysis project.

Define the data source for the Organic Purchase Analysis project. Introduction to Predictive Modeling Define the data source for the Organic Purchase Analysis project. Your Task Define AAEM.ORGANICS as a data source in the elearnem project and adjust the column metadata

More information

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV Subject Name: Elective I Data Warehousing & Data Mining (DWDM) Subject Code: 2640005 Learning Objectives: To understand

More information

Certkiller.A QA

Certkiller.A QA Certkiller.A00-260.70.QA Number: A00-260 Passing Score: 800 Time Limit: 120 min File Version: 3.3 It is evident that study guide material is a victorious and is on the top in the exam tools market and

More information

A Simple Time Series Macro Scott Hanson, SVP Risk Management, Bank of America, Calabasas, CA

A Simple Time Series Macro Scott Hanson, SVP Risk Management, Bank of America, Calabasas, CA A Simple Time Series Macro Scott Hanson, SVP Risk Management, Bank of America, Calabasas, CA ABSTRACT One desirable aim within the financial industry is to understand customer behavior over time. Despite

More information

Data Management - 50%

Data Management - 50% Exam 1: SAS Big Data Preparation, Statistics, and Visual Exploration Data Management - 50% Navigate within the Data Management Studio Interface Register a new QKB Create and connect to a repository Define

More information

Enterprise Miner Software: Changes and Enhancements, Release 4.1

Enterprise Miner Software: Changes and Enhancements, Release 4.1 Enterprise Miner Software: Changes and Enhancements, Release 4.1 The correct bibliographic citation for this manual is as follows: SAS Institute Inc., Enterprise Miner TM Software: Changes and Enhancements,

More information

Employer Reporting. User Guide

Employer Reporting. User Guide Employer Reporting This user guide provides an overview of features supported by the Employer Reporting application and instructions for viewing, customizing, and exporting reports. System Requirements

More information

Data Science Course Content

Data Science Course Content CHAPTER 1: INTRODUCTION TO DATA SCIENCE Data Science Course Content What is the need for Data Scientists Data Science Foundation Business Intelligence Data Analysis Data Mining Machine Learning Difference

More information

Data Quality Control for Big Data: Preventing Information Loss With High Performance Binning

Data Quality Control for Big Data: Preventing Information Loss With High Performance Binning Data Quality Control for Big Data: Preventing Information Loss With High Performance Binning ABSTRACT Deanna Naomi Schreiber-Gregory, Henry M Jackson Foundation, Bethesda, MD It is a well-known fact that

More information

Data Quality Control: Using High Performance Binning to Prevent Information Loss

Data Quality Control: Using High Performance Binning to Prevent Information Loss SESUG Paper DM-173-2017 Data Quality Control: Using High Performance Binning to Prevent Information Loss ABSTRACT Deanna N Schreiber-Gregory, Henry M Jackson Foundation It is a well-known fact that the

More information

Data warehouses Decision support The multidimensional model OLAP queries

Data warehouses Decision support The multidimensional model OLAP queries Data warehouses Decision support The multidimensional model OLAP queries Traditional DBMSs are used by organizations for maintaining data to record day to day operations On-line Transaction Processing

More information

SAS ENTERPRISE GUIDE USER INTERFACE

SAS ENTERPRISE GUIDE USER INTERFACE Paper 294-2008 What s New in the 4.2 releases of SAS Enterprise Guide and the SAS Add-In for Microsoft Office I-kong Fu, Lina Clover, and Anand Chitale, SAS Institute Inc., Cary, NC ABSTRACT SAS Enterprise

More information

Using PROC SQL to Calculate FIRSTOBS David C. Tabano, Kaiser Permanente, Denver, CO

Using PROC SQL to Calculate FIRSTOBS David C. Tabano, Kaiser Permanente, Denver, CO Using PROC SQL to Calculate FIRSTOBS David C. Tabano, Kaiser Permanente, Denver, CO ABSTRACT The power of SAS programming can at times be greatly improved using PROC SQL statements for formatting and manipulating

More information

CS490D: Introduction to Data Mining Prof. Chris Clifton

CS490D: Introduction to Data Mining Prof. Chris Clifton CS490D: Introduction to Data Mining Prof. Chris Clifton April 5, 2004 Mining of Time Series Data Time-series database Mining Time-Series and Sequence Data Consists of sequences of values or events changing

More information

ABSTRACT INTRODUCTION MACRO. Paper RF

ABSTRACT INTRODUCTION MACRO. Paper RF Paper RF-08-2014 Burst Reporting With the Help of PROC SQL Dan Sturgeon, Priority Health, Grand Rapids, Michigan Erica Goodrich, Priority Health, Grand Rapids, Michigan ABSTRACT Many SAS programmers need

More information

Data Warehouse and Mining

Data Warehouse and Mining Data Warehouse and Mining 1. is a subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management decisions. A. Data Mining. B. Data Warehousing. C. Web Mining. D. Text

More information

SAS E-MINER: AN OVERVIEW

SAS E-MINER: AN OVERVIEW SAS E-MINER: AN OVERVIEW Samir Farooqi, R.S. Tomar and R.K. Saini I.A.S.R.I., Library Avenue, Pusa, New Delhi 110 012 Samir@iasri.res.in; tomar@iasri.res.in; saini@iasri.res.in Introduction SAS Enterprise

More information

Lecture 18. Business Intelligence and Data Warehousing. 1:M Normalization. M:M Normalization 11/1/2017. Topics Covered

Lecture 18. Business Intelligence and Data Warehousing. 1:M Normalization. M:M Normalization 11/1/2017. Topics Covered Lecture 18 Business Intelligence and Data Warehousing BDIS 6.2 BSAD 141 Dave Novak Topics Covered Test # Review What is Business Intelligence? How can an organization be data rich and information poor?

More information

Reducing Credit Union Member Attrition with Predictive Analytics

Reducing Credit Union Member Attrition with Predictive Analytics Reducing Credit Union Member Attrition with Predictive Analytics Nate Derby Stakana Analytics Seattle, WA PhilaSUG 10/29/15 Nate Derby Reducing Credit Union Attrition 1 / 28 Outline 1 2 Duplicating the

More information

Base and Advance SAS

Base and Advance SAS Base and Advance SAS BASE SAS INTRODUCTION An Overview of the SAS System SAS Tasks Output produced by the SAS System SAS Tools (SAS Program - Data step and Proc step) A sample SAS program Exploring SAS

More information

STAT 3304/5304 Introduction to Statistical Computing. Introduction to SAS

STAT 3304/5304 Introduction to Statistical Computing. Introduction to SAS STAT 3304/5304 Introduction to Statistical Computing Introduction to SAS What is SAS? SAS (originally an acronym for Statistical Analysis System, now it is not an acronym for anything) is a program designed

More information

JMP Clinical. Release Notes. Version 5.0

JMP Clinical. Release Notes. Version 5.0 JMP Clinical Version 5.0 Release Notes Creativity involves breaking out of established patterns in order to look at things in a different way. Edward de Bono JMP, A Business Unit of SAS SAS Campus Drive

More information

Data warehouse and Data Mining

Data warehouse and Data Mining Data warehouse and Data Mining Lecture No. 14 Data Mining and its techniques Naeem A. Mahoto Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

More information

CoE CENTRE of EXCELLENCE ON DATA WAREHOUSING

CoE CENTRE of EXCELLENCE ON DATA WAREHOUSING in partnership with Overall handbook to set up a S-DWH CoE: Deliverable: 4.6 Version: 3.1 Date: 3 November 2017 CoE CENTRE of EXCELLENCE ON DATA WAREHOUSING Handbook to set up a S-DWH 1 version 2.1 / 4

More information

SAS Web Report Studio 3.1

SAS Web Report Studio 3.1 SAS Web Report Studio 3.1 User s Guide SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2006. SAS Web Report Studio 3.1: User s Guide. Cary, NC: SAS

More information

Techdata Solution. SAS Analytics (Clinical/Finance/Banking)

Techdata Solution. SAS Analytics (Clinical/Finance/Banking) +91-9702066624 Techdata Solution Training - Staffing - Consulting Mumbai & Pune SAS Analytics (Clinical/Finance/Banking) What is SAS SAS (pronounced "sass", originally Statistical Analysis System) is an

More information

EMC Ionix ControlCenter (formerly EMC ControlCenter) 6.0 StorageScope

EMC Ionix ControlCenter (formerly EMC ControlCenter) 6.0 StorageScope EMC Ionix ControlCenter (formerly EMC ControlCenter) 6.0 StorageScope Best Practices Planning Abstract This white paper provides advice and information on practices that will enhance the flexibility of

More information

Paper HOW-06. Tricia Aanderud, And Data Inc, Raleigh, NC

Paper HOW-06. Tricia Aanderud, And Data Inc, Raleigh, NC Paper HOW-06 Building Your First SAS Stored Process Tricia Aanderud, And Data Inc, Raleigh, NC ABSTRACT Learn how to convert a simple SAS macro into three different stored processes! Using examples from

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

Choosing the Right Procedure

Choosing the Right Procedure 3 CHAPTER 1 Choosing the Right Procedure Functional Categories of Base SAS Procedures 3 Report Writing 3 Statistics 3 Utilities 4 Report-Writing Procedures 4 Statistical Procedures 6 Available Statistical

More information

BASICS BEFORE STARTING SAS DATAWAREHOSING Concepts What is ETL ETL Concepts What is OLAP SAS. What is SAS History of SAS Modules available SAS

BASICS BEFORE STARTING SAS DATAWAREHOSING Concepts What is ETL ETL Concepts What is OLAP SAS. What is SAS History of SAS Modules available SAS SAS COURSE CONTENT Course Duration - 40hrs BASICS BEFORE STARTING SAS DATAWAREHOSING Concepts What is ETL ETL Concepts What is OLAP SAS What is SAS History of SAS Modules available SAS GETTING STARTED

More information

The Definitive Guide to Preparing Your Data for Tableau

The Definitive Guide to Preparing Your Data for Tableau The Definitive Guide to Preparing Your Data for Tableau Speed Your Time to Visualization If you re like most data analysts today, creating rich visualizations of your data is a critical step in the analytic

More information

Mapping Clinical Data to a Standard Structure: A Table Driven Approach

Mapping Clinical Data to a Standard Structure: A Table Driven Approach ABSTRACT Paper AD15 Mapping Clinical Data to a Standard Structure: A Table Driven Approach Nancy Brucken, i3 Statprobe, Ann Arbor, MI Paul Slagle, i3 Statprobe, Ann Arbor, MI Clinical Research Organizations

More information

A Cross-national Comparison Using Stacked Data

A Cross-national Comparison Using Stacked Data A Cross-national Comparison Using Stacked Data Goal In this exercise, we combine household- and person-level files across countries to run a regression estimating the usual hours of the working-aged civilian

More information

An Introduction to Data Mining in Institutional Research. Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa

An Introduction to Data Mining in Institutional Research. Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa An Introduction to Data Mining in Institutional Research Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa AIR/SPSS Professional Development Series Background Covering variety

More information

Optimizing Your Analytics Life Cycle with SAS & Teradata. Rick Lower

Optimizing Your Analytics Life Cycle with SAS & Teradata. Rick Lower Optimizing Your Analytics Life Cycle with SAS & Teradata Rick Lower 1 Agenda The Analytic Life Cycle Common Problems SAS & Teradata solutions Analytical Life Cycle Exploration Explore All Your Data Preparation

More information

JMP Book Descriptions

JMP Book Descriptions JMP Book Descriptions The collection of JMP documentation is available in the JMP Help > Books menu. This document describes each title to help you decide which book to explore. Each book title is linked

More information

Customer Clustering using RFM analysis

Customer Clustering using RFM analysis Customer Clustering using RFM analysis VASILIS AGGELIS WINBANK PIRAEUS BANK Athens GREECE AggelisV@winbank.gr DIMITRIS CHRISTODOULAKIS Computer Engineering and Informatics Department University of Patras

More information

Automatic Detection of Section Membership for SAS Conference Paper Abstract Submissions: A Case Study

Automatic Detection of Section Membership for SAS Conference Paper Abstract Submissions: A Case Study 1746-2014 Automatic Detection of Section Membership for SAS Conference Paper Abstract Submissions: A Case Study Dr. Goutam Chakraborty, Professor, Department of Marketing, Spears School of Business, Oklahoma

More information

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing. About the Tutorial A data warehouse is constructed by integrating data from multiple heterogeneous sources. It supports analytical reporting, structured and/or ad hoc queries and decision making. This

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No. 01 Databases, Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro

More information

Right-click on whatever it is you are trying to change Get help about the screen you are on Help Help Get help interpreting a table

Right-click on whatever it is you are trying to change Get help about the screen you are on Help Help Get help interpreting a table Q Cheat Sheets What to do when you cannot figure out how to use Q What to do when the data looks wrong Right-click on whatever it is you are trying to change Get help about the screen you are on Help Help

More information

BC Hydro Customer Classification Model Using SAS 9.2

BC Hydro Customer Classification Model Using SAS 9.2 BC Hydro Customer Classification Model VanSug Business Analytics Meeting Scott Albrechtsen BC Hydro, Load Analysis November 2, 2011 Agenda Business Problem & Why We Care The Data Sources Used The Modeling

More information

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms

More information

DC Area Business Objects Crystal User Group (DCABOCUG) Data Warehouse Architectures for Business Intelligence Reporting.

DC Area Business Objects Crystal User Group (DCABOCUG) Data Warehouse Architectures for Business Intelligence Reporting. DC Area Business Objects Crystal User Group (DCABOCUG) Data Warehouse Architectures for Business Intelligence Reporting April 14, 2009 Whitemarsh Information Systems Corporation 2008 Althea Lane Bowie,

More information

Inline Processing Engine User Guide. Release: August 2017 E

Inline Processing Engine User Guide. Release: August 2017 E Inline Processing Engine User Guide Release: 8.0.5.0.0 August 2017 E89148-01 Inline Processing Engine User Guide Release: 8.0.5.0.0 August 2017 E89148-01 Oracle Financial Services Software Limited Oracle

More information

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective B.Manivannan Research Scholar, Dept. Computer Science, Dravidian University, Kuppam, Andhra Pradesh, India

More information