Decision Making Procedure: Applications of IBM SPSS Cluster Analysis and Decision Tree

Similar documents
Now, Data Mining Is Within Your Reach

UNIT 4. Research Methods in Business

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file

ARTIFICIAL INTELLIGENCE (CS 370D)

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Data Mining. SPSS Clementine k-means Algorithm. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

Types of Data Mining

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Ivy s Business Analytics Foundation Certification Details (Module I + II+ III + IV + V)

Data analysis using Microsoft Excel

Research Data Analysis using SPSS. By Dr.Anura Karunarathne Senior Lecturer, Department of Accountancy University of Kelaniya

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

GETTING STARTED. A Step-by-Step Guide to Using MarketSight

A Genetic Algorithm-Based Approach for Building Accurate Decision Trees

Test designs for evaluating the effectiveness of mail packs Received: 30th November, 2001

SPSS Instructions and Guidelines PSCI 2300 Intro to Political Science Research Dr. Paul Hensel Last updated 10 March 2018

Towards a Cross- Disciplinary Pedagogy for Big Data. Joshua Eckroth Math/CS Department Stetson University CCSC- Eastern 2015

SPM Users Guide. This guide elaborates on powerful ways to combine the TreeNet and GPS engines to achieve model compression and more.

IBMSPSSSTATL1P: IBM SPSS Statistics Level 1

Classification and Regression

Notes based on: Data Mining for Business Intelligence

Credit card Fraud Detection using Predictive Modeling: a Review

Applied Regression Modeling: A Business Approach

Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3

COMP 465: Data Mining Classification Basics

Package woebinning. December 15, 2017

Gain Greater Productivity in Enterprise Data Mining

4. Descriptive Statistics: Measures of Variability and Central Tendency

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

An introduction to SPSS

International Journal of Software and Web Sciences (IJSWS)

INTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN...

What s New in Spotfire DXP 1.1. Spotfire Product Management January 2007

7.4 Tutorial #4: Profiling LC Segments Using the CHAID Option

Data mining overview. Data Mining. Data mining overview. Data mining overview. Data mining overview. Data mining overview 3/24/2014

劉介宇 國立台北護理健康大學 護理助產研究所 / 通識教育中心副教授 兼教師發展中心教師評鑑組長 Nov 19, 2012

Release notes SPSS Statistics 20.0 FP1 Abstract Number Description

Introduction (SPSS) Opening SPSS Start All Programs SPSS Inc SPSS 21. SPSS Menus

INTRODUCTORY SPSS. Dr Feroz Mahomed Swalaha x2689

University of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka

DSC 201: Data Analysis & Visualization

A Monotonic Sequence and Subsequence Approach in Missing Data Statistical Analysis

Chapter 17: INTERNATIONAL DATA PRODUCTS

Extra readings beyond the lecture slides are important:

Randomized Response Technique in Data Mining

Statistical Pattern Recognition

A Brief Introduction to Data Mining

Data Preprocessing. Data Preprocessing

From Building Better Models with JMP Pro. Full book available for purchase here.

MHPE 494: Data Analysis. Welcome! The Analytic Process

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Machine Learning (BSMC-GA 4439) Wenke Liu

Statistical Package for the Social Sciences INTRODUCTION TO SPSS SPSS for Windows Version 16.0: Its first version in 1968 In 1975.

Comparative analysis of data mining methods for predicting credit default probabilities in a retail bank portfolio

CSE4334/5334 DATA MINING

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV

Data Preprocessing. Slides by: Shree Jaswal

Enterprise Miner Tutorial Notes 2 1

Correctly Compute Complex Samples Statistics

Getting Started With. A Step-by-Step Guide to Using WorldAPP Analytics to Analyze Survey Data, Create Charts, & Share Results Online

AN OVERVIEW AND EXPLORATION OF JMP A DATA DISCOVERY SYSTEM IN DAIRY SCIENCE

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

Football result prediction using simple classification algorithms, a comparison between k-nearest Neighbor and Linear Regression

Lecture outline. Decision-tree classification

Paper SAS Taming the Rule. Charlotte Crain, Chris Upton, SAS Institute Inc.

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Dr. Prof. El-Bahlul Emhemed Fgee Supervisor, Computer Department, Libyan Academy, Libya

- 1 - Fig. A5.1 Missing value analysis dialog box

Lecture 7: Decision Trees

Data Mining: STATISTICA

STATA 13 INTRODUCTION

Performance Analysis of Data Mining Classification Techniques

Decision Tree Structure: Root

Data Quality Control: Using High Performance Binning to Prevent Information Loss

User Services Spring 2008 OBJECTIVES Introduction Getting Help Instructors

DATA MINING AND WAREHOUSING

Data Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology

1. Inroduction to Data Mininig

Preprocessing DWML, /33

Creating a data file and entering data

Chapter 2: Understanding Data Distributions with Tables and Graphs

Image Segmentation. 1Jyoti Hazrati, 2Kavita Rawat, 3Khush Batra. Dronacharya College Of Engineering, Farrukhnagar, Haryana, India

An introduction to classification and regression trees with PROC HPSPLIT Peter L. Flom Peter Flom Consulting, LLC

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA)

Query Disambiguation from Web Search Logs

Data Mining Part 5. Prediction

CHAPTER 4: CLUSTER ANALYSIS

Applying Phonetic Hash Functions to Improve Record Linking in Student Enrollment Data

Applied Statistics for the Behavioral Sciences

Data Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification

Q: Which month has the lowest sale? Answer: Q:There are three consecutive months for which sale grow. What are they? Answer: Q: Which month

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing

Intermediate SPSS. If you have an SPSS dataset (*.sav), you can open it in the following way:

SPSS Statistics 19.0 Fix Pack 2 Fix List Release notes Abstract Content Number Description

Data Quality Control for Big Data: Preventing Information Loss With High Performance Binning

Data Mining Classification - Part 1 -

International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16

Statistical Pattern Recognition

Basic Medical Statistics Course

Transcription:

World Applied Sciences Journal 21 (8): 1207-1212, 2013 ISSN 1818-4952 IDOSI Publications, 2013 DOI: 10.5829/idosi.wasj.2013.21.8.2913 Decision Making Procedure: Applications of IBM SPSS Cluster Analysis and Decision Tree 1 2 3 U.B. Baizyldayeva, R.K. Uskenbayeva and S.T. Amanzholova 1 KIMEP, Almaty, Kazakhstan 2 International IT University, Almaty, Kazakhstan 3 K. Satpayev Kaz NTU, Almaty, Kazakhstan Abstract: Decision making process is one of the most complex tasks faced by all types of businesses. If the decision depends on the multi-criteria problem conditions with huge volumes of data and the data is represented in different formats, it becomes impossible to process the data without the aid of computer software. Today s market of statistical software can offer a variety of statistical software for solving different complex tasks. Moreover, we can find the very useful introductory guidance on the diversity of software packages classified as general statistics software, data mining software, mathematical computing packages, spatial statistics, econometrics packages, etc. Also there are a lot of free statistical software packages available today. In this article the IBM SPSS Statistics 19 with its cluster analysis and decision tree procedures is taken as a tool for considering decision making problems. This type of analysis can be applicable in turn, sequentially on the certain problem data. Key words: Cluster analysis Decision tree CHAID Exhaustive CHAID Classification and regression trees (C&RT) QUEST INTRODUCTION Cluster Analysis for Effective Decision Making: Let s consider the example of direct marketing division of a Modern society business can t be imagined without company wanting to identify demographic groupings in huge volumes of different formats and goals data waiting their customer database to help determine marketing for it s turn to be analyzed and decided. campaign strategies and develop new product offerings Problems of decision making are researched and [10]. offered with the aids of special statistical software [1, 2] and even spreadsheet facilities [3, 4]. The data and variable views are as in the pictures In developing countries most of business below respectively: companies, organizations use existing office The Direct Marketing Segment my contacts into spreadsheets for simple analysis, sometimes not clusters technique is the most applicable for the example complete and not providing highly visualized and dataset (Picture 3). reliable analysis results. Specifying the variables for dividing into segments It s also well known that there are many different we ll obtain the following Model Summary : mathematical and statistical approaches applicable for deciding tasks of analysis and decision making [5, 6]. Activating Cluster view option we ll see as below: Moreover, there are software packages allowing to solve different analytical and decision making problems on The Cluster view displays information on the pretty high level [7-9]. attributes of each cluster. In this article the problem of decision making stage is set out with possible solving with the aid of IBM SPSS For continuous (scale) fields, the mean (average) 19 value is displayed. Corresponding Author: U.B. Baizyldayeva, KIMEP, Polevaya 8 st., 050007 Almaty, Kazakhstan. 1207

Picture 1: IBM SPSS Data view Picture 2. IBM SPSS Variable view Picture 3. Cluster Analysis 1208

Picture 4: Model summary Picture 5: Cluster view For categorical (nominal, ordinal) fields, the mode is For categorical fields, a bar chart is displayed. The displayed. The mode is the category with the largest most notable feature of the income category bar chart for number of records. In this example, each record is a this cluster is the complete absence of any customers in customer. the lowest income category. By default, fields are displayed in the order of their overall importance to the model. In this example, Age The Cluster comparison view allows to visualize the has the highest overall importance. You can also sort clusters differences fields by within-cluster importance or alphabetical The received output information in tables and order. diagrams makes it easy to take decision on the problem. If to select (click) any cell in Cluster view, you can see a chart that summarizes the values of that field for that Decision Tree for Effective Decision Making: IBM SPSS cluster. Decision Trees helps better identify groups, discover relationships between them and predict future events. For example, select the Age cell for cluster 1. This module features highly visual classification and decision trees. These trees enable to For continuous fields, a histogram is displayed. The present categorical results in an intuitive manner, so that histogram displays both the distribution of values within to clearly explain categorical analysis to non-technical that cluster and the overall distribution of values for the audiences. field. The histogram indicates that the customers in IBM SPSS Decision Trees enables to explore results cluster 1 tend to be somewhat older. and visually determine how the model flows. This helps to find specific subgroups and relationships that impossible Select the Income category cell for cluster 1 in the with more traditional statistics. Decision Trees is Cluster view. applicable for: 1209

Picture 6: Cluster histogram Picture 7: Cluster bar type chart Picture 8: Cluster comparison view Database marketing Exhaustive CHAID A modification of CHAID, Market research which examines all possible splits for each predictor Credit risk scoring Classification and regression trees (C&RT) A Program targeting complete binary tree algorithm, which partitions data Marketing in the public sector and produces accurate homogeneous subsets Transportation problem QUEST A statistical algorithm that selects variables without bias and builds accurate binary IBM SPSS Decision Trees provides specialized tree- trees quickly and efficiently building techniques for classification entirely within the IBM SPSS Statistics environment. It includes four With four algorithms it s possible try different types established tree-growing algorithms: of tree-growing algorithms and find the one that best fits your data. CHAID: A fast, statistical, multi-way tree algorithm that Because classification trees are created directly explores data quickly and efficiently and builds segments within IBM SPSS Statistics, it s convenient to use the and profiles with respect to the desired outcome. CHAID results to segment and group cases directly within the algorithm only accepts nominal or ordinal categorical data. Additionally, it s possible to generate selection or predictors. When predictors are continuous, they are classification/prediction rules in the form of IBM SPSS transformed into ordinal predictors before using the Statistics syntax, SQL statements or simple text (through following algorithm. Both CHAID and exhaustive CHAID syntax). algorithms consist of three steps: merging, splitting and Decision tree output in the Viewer may be saved to stopping. A tree is grown by repeatedly using these three an external file or saved in XML model for later use to steps on each node starting from the root node. make predictions about individual and new cases. 1210

Let s consider the example hypothetical data file containing demographic and bank loan history data (see the sample table below Picture 9). Picture 9: IBM SPSS Variable view To Run a Decision Tree Analysis, from the Menus Choose: Analyze > Classify > Tree... Set the dependent, independent and influence variables. For Our Case It ll Be as Following: Picture 10: Decision Tree Variables Specification Picture 11: Decision Tree view The resulting output view will look as in the schema below: 1211

Picture 12: Risk estimation view Risk Estimate Std. Error 218.008 Growing Method: CHAID Dependent Variable: Credit rating Picture 13: Classification Tree view Classification Predicted ------------------------------------------------------------ Observed Bad Good Percent Correct Bad 808 212 79.2% Good 325 1119 77.5% Overall Percentage 46.0% 54.0% 78.2% Growing Method: CHAID Dependent Variable: Credit rating The estimation of the risk is reflected as 0.218 or 21.8%. And, finally, the classification of the observed cases will be represented in crosstabs: These types of analysis and output views allow to ease the decision making process of any type of business and scientific problem. CONCLUSION In this article we ve considered one of the application approaches possible for decision making process of the concrete problem with IBM SPSS 19. REFERENCES 1. Cluster Analysis (Wiley Series in Probability and Statistics) (5th Edition) by Brian S. Everitt, Dr Sabine Landau, DrMorvenLeese, Dr Daniel Stahl, Wiley, 2010. 2. IBM SPSS Statistics Student Version 18.0 for Windows and Mac OS X (1st Edition) by SPSS Inc. 2010. 3. Excel models for business and operations management / John F. Barlow.Chichester, West Sussex ; Hoboken, NJ : Wiley, c2005. 4. Data analysis & decision making with Microsoft Excel / S. Christian Albright, Wayne L. Winston, Christopher Zappe ; with cases by Mark Broadie... [et al.] Australia ; United Kingdom : Thomson South- Western, c2006. 5. Aspiration based decision support systems: theory, software and applications / A. Lewandowski, A.P. Wierzbicki, eds. Berlin ; New York : Springer-Verlag, 1989. 6. Business information systems: a problem solving approach / Kenneth C. Laudon, Jane Price Laudon. Chicago: Dryden Press, c1991. 7. Amstatonline URL: http://www.amstat.org/ 8. Stata: Data Analysis and Statistical Software. URL: http://www.stata.com/ 9. SAS: Business Analytics and Business Intelligence Software URL: http://www.sas.com/ 10. IBM SPSS Software URL: http://www- 142.ibm.com/software/products/us/en/spss-decisionmanagement 1212